The Unreasonable Effectiveness of Data
In a recent paper, a trio of Google researchers distilled their findings from trying to solve machine learning’s most difficult problems: “simple models based on lots of data trump more elaborate models based on less data”.The most elaborate model possible is the human mind, working on ‘gut feel’. It can take into account a number of different variables like weather, morale, stock-market shifts, standing of the local baseball team. But in terms of data, an individual is usually restricted to the 50,000 rows in an excel spreadsheet. Also, the model is highly emotional and unpredictable.
Fitting Models to Data
The scientific approach is to create a simple model (the variables are usually restricted to seasonality, holidays, traffic, inventory and price). But the simple model is verified on a large volume of data, and the output is rational and predictable.
The diagram on the left shows how we can fit the actual sales (in black), with a model (in red), based on Seasonality, Holidays, and Price. While there can be many more variables that can be used in the model-fitting, it is important to remember that not all variables have equal predictive capabilities. A model that does a great job in fitting the data, by cannot be used to project into the future has very little practical use.
The other important factor to keep in mind is that correlation does not necessarily mean causation. Real life data is complicated, and often incomplete. It takes careful analysis before true causal factors can be teased out.
The science of retail is evolving, and Forecast Horizon delivers actionable scientific insights, through the use of complex algorithms running on the cloud computing platform.
- Hal Varian, Chief Economist, Google.
