How to Read a Nelson-aalen Cumulative Hazard Estimate

March 03, 2022 Post a Comment

Introduction to Survival Assay: the Nelson-Aalen estimator

Learn how to use the non-parametric approach to estimating the cumulative hazard function!

In the previous commodity, I have described the Kaplan-Meier figurer. To give a quick epitomize, it is a non-parametric method to approximating the true survival function. This time, I will focus on another approach to visualizing a survival dataset — using the hazard function and the Nelson-Aalen figurer. Once over again, we will apply the convenience of the lifetimes library to quickly create the plots in Python.

i. The Nelson-Aalen estimator

With the Kaplan-Meier curves, we approximated the survival role, stating the probability of the event of interest (for example, the death event) not occurring by a certain time t.

An al t ernative arroyo to visualizing the aggregate information from a survival-focused dataset entails using the gamble function, which tin be interpreted equally the probability of the subject experiencing the issue of involvement inside a small interval of time, assuming that the subject has survived up until the offset of the said interval. For a more detailed description of the hazard function, please run across this commodity.

Unfortunately, nosotros cannot transform the Kaplan-Meier estimate of the survival role to the risk part. However, we can use another non-parametric estimator of the cumulative hazard function — the Nelson-Aalen calculator. In brusk, information technology is used to estimate the cumulative number of expected events within a certain flow of time. And the reason why it is cumulative is that the sum of estimates is much more than stable than the point-wise estimates.

The Nelson-Aalen estimator tin be calculated as:

where d_i stands for the number of events of interest at time t, while n_i is the number of observations at risk. All these terms are naturally similar to the ones in the Kaplan-Meier estimator's formula.

The Nelson-Aalen computer, or more generally visualizing the hazard role over time, is not a very popular arroyo to survival analysis. That is considering — in comparison to the survival role — caption of the curves is not so simple and intuitive. However, the hazard function is of nifty importance for more than advanced approaches to survival analysis, for example, the Cox regression. That is why information technology is important to understand the concept and I volition try to provide some insights about it. Nosotros can say that the cumulative hazard function:

measures the total amount of run a risk that has been accumulated upwards to a certain indicate of fourth dimension t.
provides the number of times we would mathematically wait the occurrence of the event of interest over a sure catamenia if only the events were repeatable. This might be a bit confusing, and then to make the argument a bit simpler (notwithstanding not that realistic) you can think of the cumulative hazard office as the expected number of deaths of an private up to time t, if the individual could to exist resurrected after each death without resetting the time. As I said, not that realistic, merely this could be simply too applied to machine failures, etc.

The last concept that might exist useful to get some intuition about the cumulative hazard role is the bathtub curve, or rather its components. The curve represents the life cycles of many electronic consumer products. The run a risk rate of the bathtub bend is created past combining:

the rate of early on "babe bloodshed" failures when the product is first introduced,
the rate of random failures with abiding failure rate during the product's designed life,
the rate of "vesture out" failures as the product exceeds its expected lifetime.

While the epitome above represents the take chances rate (non the cumulative one!), the shape of the Nelson–Aalen estimator'due south curve gives us an thought of how the risk rate changes over time.

For example, the concave shape of the cumulative hazard function indicates that we are dealing with an "infant mortality" kind of event (dotted red line in the image), where the rate of failure is highest early and decreases with time. On the other mitt, the convex shape of the cumulative risk function implies we are dealing with the "wear out" kind of issue (dotted xanthous line).

I believe this is enough theory to understand the Nelson-Aalen estimator of the cumulative hazard function. It is fourth dimension to code!

2. Example in Python

For consistency, we go along the example using the pop Telco Customer Churn dataset started in the previous commodity. For brevity's sake, please refer to that article for a description of the dataset and the reasoning for the transformations applied to it. First, nosotros load the required libraries.

Then, we load the data:

lifelines makes the procedure of computing and plotting the Nelson-Aalen computer very simple, we only need to run the following few lines of code to plot the cumulative chance function.

The lawmaking generates the following plot:

I would say that the shape of the cumulative hazard function based on the Nelson-Aalen'due south estimator might point that we are dealing with the hazard role similar to the bathtub curve. The reason for this is that we encounter that both in the beginning and close to the end the charge per unit of change is higher, while it more than or less flattens out (stabilizes at a abiding level) in the middle of the customers' lifetime with the visitor.

We can also easily admission the cumulative run a risk function past using the cumulative_hazard_ method of the fitted NelsonAalenFitter object.

1 more than interesting functionality provided past the library is the events table, which summarizes what happened at each point of time. We can obtain it past running naf.event_table and the result looks as follows:

Similarly to the Kaplan-Meier case, we volition also plot the cumulative run a risk part for each variant of the payment method. As lifelines provides a uniformed way of working with dissimilar tools used for survival analysis, the code requires only a few slight modifications.

The shapes of the cumulative hazard functions are very similar for ii automatic payment categories: bank transfer and credit bill of fare.

Note: In the theoretical introduction, we mentioned that the reason for working with the cumulative hazard function instead of the gamble function is the college accuracy of the former. Yet, lifelines provides a style of deriving the chance function from the cumulative one by the means of applying a kernel smoother. And then where is the grab? To practice and so, nosotros demand to specify the bandwidth parameter, and the shape of the resulting gamble role is highly dependent on the selected value. I will quote the author's comment about this arroyo: "In that location is no obvious fashion to cull a bandwidth, and different bandwidths produce different inferences, so it's best to be very careful here. My advice: stick with the cumulative hazard part.". In example y'all are still interested, please check out the documentation.

3. Conclusions

In this article, I tried to provide an introduction to estimating the cumulative risk role and some intuition about the estimation of the results. While the Nelson-Aalen reckoner is far less popular than the Kaplan-Meier survival curves, agreement it might be very helpful while working with more than avant-garde methods of survival assay, such equally the Cox regression.

You can observe the code used for this article on my GitHub. As ever, any constructive feedback is welcome. You tin reach out to me on Twitter or in the comments.

In example you plant this article interesting, you might besides like the other ones in the series:

4. References

[1] https://stats.stackexchange.com/questions/60238/intuition-for-cumulative-hazard-function-survival-analysis

cochranstics1991.blogspot.com

Source: https://towardsdatascience.com/introduction-to-survival-analysis-the-nelson-aalen-estimator-9780c63d549d

Cochran Stics1991