Predictive Analytics Defined by Data and Statistics

Joe Anderson
| Dec 29, 2014
By Joe Anderson, Director of Analytics

As previously run in WorkCompWire

Imagine you have two injured workers to manage, both are early on in their injury, and both could be at risk for long-term pain and opioid use. How do you decide which—if either—requires your attention? Maybe you have some sort of index that combines everything you know about the injured worker’s behavior; you give the first injured worker a couple points for their opioid use, and another for the fact that they’re using a brand medication when a generic is available. For the other injured worker, there’s less opioid use (perhaps fewer points), but the use of antidepressants warrants points for the risk of psychological comorbidities. The first injured worker may have an index of four out of 10 and your second injured worker may have an index of three out of 10, but is the four really more likely to develop into a complication than the three? Is that enough to justify sending one to a nurse manager, but not the other? How can you know?

The best way to understand potential risk and make the best decisions is with the use of predictive analytics. Predictive analytics have become a complex topic in workers’ compensation because there is a lack of standard methods and methodologies. Many blur the lines between true predictive analytics and data analysis and data access. It comes down to two things: data and statistics. By using historical pharmacy data, you can correlate each independent factor known about the injured workers to their long-term severity. With statistical modeling, these can be balanced against each other to generate an accurate prediction. As a result, you’ll have fairly confident projections such as an injured worker that’s 21% likely to be a long-term claimant or an injured worker that’s 83% likely to be a long-term claimant, rather than one ranked as simply a “three” or a “four,” based on a group of potential risk factors.

True predictive analytics rely strongly on data. It is the foundation for predictive analytics. For something to qualify as “predictive analytics,” it needs to use data to determine the best predictors. At Helios, we use the industry’s largest set of pharmacy data to determine which injured workers are most likely to result in high pharmacy costs, or to have the longest duration of opioid use. These two examples (“pharmacy costs” and “duration of opioid use”) are examples of dependent variables, or what you’re trying to predict. The more data you have, the more you can predict.

Google provides a great example of predictive analytics. Google has been able to predict flu trends simply by analyzing the “aggregated search data” in their search engine1. Google has been able to “estimate how much the flu is circulating in different countries and regions around the world” by adding all flu-related search queries together. Year after year, they compare their model’s estimate to traditional flu surveillance systems and refine their models to improve performance. As a result, their findings have been published in the journal Nature, and one report suggested that Google can detect regional flu outbreaks 10 days faster than the Centers for Disease Control (CDC)2. The workers’ compensation industry can learn from this Google example. First, as the most popular search engine, Google has the largest database of search data and continues to build its data year over year. A Pharmacy Benefit Manager (PBM) that has a long history of workers’ compensation-specific prescription medication behavior would provide a similar foundation.

Second, Google analyzes its data with a specific goal in mind. In the flu trends example, if you ask, “What is the dependent variable?” or “What is Google trying to predict?” the answer is “flu severity.” Similarly, when using predictive analytics in workers’ compensation, you want to have a specific goal in mind. For example, predicting long-term pharmacy cost. This will yield a specific and measurable result. Be wary of supposed predictions that can only give you a vague answer of what they’re trying to predict, like “risk,” “severity,” or “concern.”

Lastly, Google is transparent about back-testing its data. In the flu trends example, Google compares its estimates against historical data to show their level of accuracy, and uses this to refine and improve their predictive models. This is a requirement for predictive analytics; how else can you determine the accuracy of the model? In workers’ compensation, a constant flow of new claims, aging of claims, and/or acquisition of business that could include legacy claims may require refinement of predictive models. An example of one of our back-tested models is below:

So the next time you read an article, attend a session, or hear a conversation about predictive analytics, remember two things: data and statistics. It should help remove some of the complexity and help to distinguish from other types of analysis.

Stay informed by receiving latest updates

Do you have a question about a blog post?