Contrary to what your parents told you, you are not as unique as a snowflake. This is in part what makes predictive analytics work. In fact, there are three major factors that enable data scientists and machine learning algorithms to deduce and predict with a reasonable degree of accuracy. If you take these factors into consideration the next time you embark on a predictive analytics project, you stand to save your organization a lot of time and resources.
Habits
The first is that we are all creatures of habit. We repeat the same behaviors day in and day out. For example, I go to my job every weekday, and most of the time take the same route while dropping off kids at school and getting to work. Moreover, my weekend behaviors also repeat, whether trips to the gym or grocery shopping or occasional meal out or sporting event.
This creates a pattern over time, reflected in historical data, and that pattern can be used to create a model to predict behavior in the future. What we do, and how we do it, forms habits, and those are pretty strong patterns that are reflected in data, which can then form predictive models.
Yes, habits occasionally change, but, holding everything else roughly constant, I would bet on them repeating. As a matter of fact, the occasional observed change in habit can be a good predictor of another kind, but this is a separate topic (hint: “holding everything else constant”).
Similarities
Second, as already mentioned, we are not as unique as we think we are. There are many people out there that are a lot like each of us, and this likeness or similarity can be based on different criteria or dimensions. For example, shared interests: if I like boating, then in that way I am similar or alike somebody else who also likes boating. Tens of thousands of people like boating, so I have something in common with tens of thousands of people. Other ways that people may be similar include geographic, demographic, and socioeconomic attributes, shopping history, and so forth.
These two together are enough to make predictive analytics possible. In fact, Amazon's recommendation engine is based upon the fact that we are creatures of habit, and that we are like each other. It knows where you are physically located, and what you have bought in the past, and it knows who is similar to you by location, by past purchases, etc. It then aggregates those things to make recommendations. There is no human out there on the other side of your shopping cart. It is an algorithm that knows that other people like me who are interested in data science, tend to also be interested in “Star Wars” movies, because that is the habitual pattern and the likelihood.
Systems
The third reason predictive analytics works is that systems – organizations, cultures, even nature – force patterns onto us. DNA is nothing but encoded patterns of how we look, how we act, and how we behave. Governments, another example, have defined laws that regulate behavior; we also have cultural and ethical norms; natural seasons and weather patterns; laws of nature and physics; etc. all create patterns. Newton’s Third Law? Yes, that is a predictive model built based on a pattern found in nature.
Zillow is an example of a company that pulls all three of these levers for predictive analytics. Based on houses you are looking at, it can show you similar houses in the same area, or houses that people like you have looked at. But beyond these layers, Zillow taps into patterns based on county regulations, property taxes, and how this influences property values. In that way, its predictive engine is able to take a lot of houses that seem unique at first glance, and provide individuals with a good estimation of the value of each house.
The next time you are building a predictive model, try to avoid shooting in the dark to isolate patterns from randomness, and instead start with these three major factors and build the initial list of predictors:
- Human habits that play a role in the subject matter (individual patterns over time)
- The ways that different objects can be alike (property networks)
- Systematic patterns that play some role in the subject matter (contextual patterns)