Want to be able to predict when an employee is going to leave? Survival analysis can help.
We all understand the costs of employee turnover in our business. It’s expensive and therefore vitally important to understand the dynamics of how and when employees exit your business.
Just like death and taxes, in the world of HR, turnover is guaranteed. It sounds a little bleak, but it’s not. At some point, all employees will eventually exit the business, it may be within their probation period or it may be after a fruitful twenty years or more.
Once we, as HR professionals come to terms with this fact, it follows that we would want to predict how long we can expect an employee to stay with our business once they’re hired. This has utility in many domains, from predicting the return on investment probability of a new hire, to modelling your future workforce by considering future movements in and out of your business.
That’s where survival analysis comes in. And in this post we’re going to be showing you how survival analysis can help answer questions such as: what proportion of our organization will stay with the business past a certain time? Or, given they reach a certain tenure, at what rate would we expect them to leave the business?
Survival analysis is a statistical method aimed at determining the expected duration of time until an event occurs. In this instance, the event is an employee exiting the business.
As the name might suggest, survival analysis was developed in biomedical sciences to analyze the proportion of patients surviving to particular times after the application of a treatment. Since then, it’s been applied to many situations where the event of interest is binary: that either it doesn’t happen or it does.
Technical definition: Survival analysis is a set of statistical approaches used to investigate the time until an event of interest.
In plain English: Expected time until an event happens.
Turnover calculations are an important metric in most businesses. They are useful to provide a method to track the movement of employee’s out of your business and identifying potential risks. However, attrition rates calculated in isolation can sometimes be misleading. They are heavily impacted by reporting periods, i.e. a period of downturn or an acquisition, or may significantly skew the output for a particular period.
From a retention strategy perspective, an attrition rate ignores important patterns as it considers time as a chronology, rather than a variable. That is, it treats the turnover of a recent starter the same as a tenured veteran. This results in missing important patterns such as milestone-based turnover trends or staying power of certain employee groups, both of which help inform retention strategy.
FREE TOOL: Employee turnover calculator
Turnover and tenure have a complex relationship. A business with high turnover will logically have a lower average tenure, and at the same time, tenure is an important factor in the decision to exit. It follows that any analysis of attrition in isolation from tenure will smooth over important insight.
Considering the above diagram, which visualizes the careers of individuals within an organization (each line represents the start and end date of an employee), a traditional attrition calculation would simply count the number of end-points during a period and divide by the number of lines present at the beginning of the period.
Survival analytics (like intelliHR’s) considers all historical and current data points together, including those that remain employed, and groups them by key tenure groups, delivering useful, predictive insights about turnover probability.
The challenge with predicting employee turnover is that for anyone that is still employed at the time of observation, their future behaviour is uncertain. They might resign the next day, continue for another ten years, or anything in between. This uncertainty is called right-censoring.
Each line segment below represents the career of an employee from start to finish date. Survival analytics converts these career lengths to tenures to calculate the probability of reaching each milestone.
To convert time into tenure, which becomes the explanatory variable in the survival function, all start dates are normalized to a time zero (see below).
Survival analytics calculates the probability of a termination occurring, based on the number of employees terminated and the size of the sample still remaining at the time.
intelliHR uses the Kaplan-Meier method to estimate the survival function due to its ability to handle right-censored data. This is important for an HR tool, as right censored data is so prominent. This method incorporates information from all observations available by splitting tenure into logical milestones (i.e. six months), and considers the probability of reaching the next milestone (i.e. one year), assuming all previous milestones were successfully reached.
The Survival function is the probability that the career: of an employee will be greater than a particular time, t.
The probabilities calculated above are plotted on the stepped survival curve (below). Although tenure is based on time and is therefore a continuous variable, the probabilities are calculated by grouping data into logical milestones of six months, giving it the stepped shape that you can see.
Because the probabilities are cumulative, meaning that the probability of reaching a given milestone relies on the fact that each previous milestone was achieved, the function is always decreasing. That is, the likelihood of reaching your third year will always be less than or equal to the probability of reaching your second year.
It’s expected that each survival curve will start at a probability of 1.0 at t=0. That is, there is a 100% likelihood that an employee will last until their first day (although, we know this isn’t always the case, if an employee fails to start). From here, the chart can be read by considering each vertical drop (step) as the change in cumulative probability as tenure advances. As can be seen in the above example:
A steep drop off in the curve suggests a greater risk of employees leaving the business at that particular length of tenure. The length of the horizontal line denotes the length of time (tenure) between the event of interest (employee terminating). A long horizontal line means that no employees were terminated across those values of tenure, therefore according to the data, the probability of surviving beyond that point does not decrease.
95% confidence intervals:
The Kaplan Meier estimate is a statistic and therefore is subject to variance. The blue shaded areas on the graph represent the variation around the true value, known as 95% confidence intervals..
The smaller your sample, or group is, the more variance there will be. As we expect that there will be fewer high-tenured employees, certainty of retention probability reduces. This results in a larger blue-shaded area at higher tenures, denoting a larger range of variability around the true value, which should be considered when making any decisions based on a dataset.
Every organization has a different survival curve, as well as each business unit within an organization. Comparison of curves can help identify issues, patterns or trends that might require intervention.
intelliHR is a people management platform helping HR, leaders and managers enhance performance, culture, engagement and retention. With built-in HRIS and powerful real-time analytics, see how the platform works today.