Accurately anticipating the future is always more effective than going in without a place and experiencing loss. This is true in most cases, especially when it comes to supply chain or distribution. Products cannot be instantly teleported and in fact, there are extensive inventory costs and shortage penalizes sales. That being said, placing the right products in the right place at the right time minimizes costs and maximizes turnover at the same time. For a successful final result, supply chain actors establish sales forecasts.
In the most basic sense, some will attempt to replicate previous orders or supply levels. While simple, it’s unreliable because we often omit to analyze the past, which includes exceptional events, no-sale periods due to stock shortage, launch or end of life of a product and store closures and so on. Whilst being naïve, there is risk involved as we assume all events affecting sales will repeat themselves in the exact same way.
Furthermore, there is an increasing number of efficient forecasting models. The best models are based on both Big Data (capacity to process large volumes of data) and Machine Learning (self-learning algorithms). These software types use the depths of mass of data available, such as sales over multiple years, cash register analysis and unlimited product features. The data analysis combined with Machine Learning algorithms identify the explanation of sales to correct the history of exceptional events, while taking future events into account to produce high-quality forecasts.
To date, the data used have remained classic. Models consider sales and inventory histories, article repositories, stores and suppliers. Many models also consider past and future commercial operations. Regardless, all of this remains with the company creating the forecasts.
However, there are so many other things to consider outside the brand that influence consumers’ buying behavior such as weather, competitors and social networks to name a few. Why do you need this information? How can we learn from them and make a better forecast? Last but not least, how can exogenous data be used to improve sales forecasting?
Using Exogenous Data Isn’t So Simple!
Several examples of when exogenous events influence purchasing decisions include:
“It’s beautiful weather this weekend, sausage sales will explode!”
« Summer has ended and it’s the first week of autumn. It’s time to go coat shopping. »
« There is a concert at the stadium and it’s impossible to navigate in this traffic. I’ll go shopping another day. »
« Kate Moss posted an Instagram photo of her new bag, I’d like to buy the same one.”
It’s simple to find specific cases but on the other hand, it is more complex to automate these cases as we need to identify and quantify the effects. To do this, it requires expertise, method and technology.
Data Collection and Qualification
First, anyone who wants to get involved in this subject will have to collect the classic data previously mentioned above (sales history, stocks, articles, etc.). After the data is collected, a mathematical model to calculate the forecast « excluding exogenous data » will need to be exercised.
In parallel, there will be the need to choose potentially influential exogenous data and organize for later retention. To ensure the data collection is appropriate, questions will be asked beforehand. These questions include: what data do we want to analyse? where can we find the data? in which format, at what price, with which durability is the source? The answers vary but for a single study (single collection, manual re-formatting) or to build a model to be used for production (recurrence and automation essential), the skills needed will also vary.
In some cases, data collection will be straightforward. The weather data are available from several suppliers such as Met Office and DarkSky, coming in structured forms and perfectly described files. We master and know the meaning of each data. While the data collection is easy, the main constraint is the heavy volume (years of historical weather on thousands of cities). It cannot be done in one day with the high level of evolution of storage and processing.
For other subjects, there will be essential reprocessing steps. Recovering all major concerts and sporting events around a given location will require data from several sources. The data recovered will then be retrieved in different, mostly complete formats. Data will then be shaped and consolidated for use. As a result, the aim is to use the data regularly, in the most automated method possible.
Machine Learning specialists will put themselves into action to carry out the crucial step of feature engineering. Powered by all available data (internal and exogenous), algorithms especially designed for this task classify the data according to their importance in explaining the sale. For example, the sales of caps of a given week in a downtown store could be explained: 30% by the month considered, 12% by the family of the article, 6% by the type of store and 3% by the day’s temperature. The information could vary significantly for another product. When exogenous data appears towards the top of the rankings, an improved forecasting model can be considered.
Now is the time to refine the model, by launching the learning phase. The main features (discriminating characteristics) identified in the previous phase will be injected into the model. The expert will then build by learning from the past (how to explain the sales history based on the data). After several training rounds, a forecasting model will be created and capable of providing a quality forecast based on future features. By analyzing sales and weather history, Machine Learning models can determine that, in Lille, in addition to normal seasons (spring, summer), the first weekends with a temperature above 20°C and no rainfall is the launch of barbecue season. These models predict the future: as soon as the weather forecast predicts 20°C and no rain, the retailer will increase the sausage supply on the shelves to satisfy the customers.
One could argue that such a case is often already covered by the experience of department heads. That’s partially true, but what happens if he gets sick? How many years of experience are needed for this? Will my department manager recognize all the affected products? That is most unlikely. The advantage of the machine is that it automates the process, making it exhaustive and recurrent. Even if the human keeps their role, the tool can be considered as an aid to decision making with the possibility remaining for the user to control the forecasts.
The interest of this model construction by learning and feature engineering is to provide an extremely fine analysis of the phenomena: 20°C in Lille or Marseille will not have the same effect, a strong wind in town or on the seaside does not bring the same reactions, snow in Paris paralyzes the region while it will increase attendance at ski resorts.
Exogenous Data, a Data Mine
We’ve described the importance of making predictions, the usefulness of considering as much data as possible, and how to do so. The few examples we have used to illustrate this are often based on weather data. However, the subject of exogenous data is much broader. At the risk of repetition, the challenge is to identify the right data, adapted to a given context, and according to the use we want to make of the forecast.
Social networks will primarily capture fundamental trends. If a trending color is featured on more blogs, Instagram, and Pinterest, my sales will surely increase. But this type of trend would have already been detected by the creators or purchasing departments of the brands well upstream.
However sometimes, a much short-term effect may be detected. For example, by scanning some people’s posts: the latest K-Way outfit worn by Johan, which Johan wears thousands of times, is likely to have high sales in upcoming weeks.
Another theme having substantial impact on the store sales is sports and cultural events in the immediate vicinity. A football match takes place tonight at a stadium near a shopping mall. Access is congested, leading to a drop in overall attendance into the shopping mall. But before the game, the fans wander around the mall. They may buy a few small items, but no furniture or appliances. The fans present will consume food and beverage products before and after the match, leading to increased foot traffic in bars and restaurants.
At the same stadium, the effect will be different if it’s a football match, a rock concert or an opera.
Again, it is important to obtain precise and qualified data. Knowing that there is an event is good, but knowing the type of event is better.
- In the case of a spare car part supplier, knowing the accident rate by area is beneficial. The supplier will be able to more precisely adjust supply stock.
- The automated study of competition can also help predict. The study is conducted by both e-commerce site scanning, as well as the published public opinions online. Price competitor surveys can also indicate a re-alignment stock and influence your forecast.
We live in an increasingly connected world, generating a mass of exponentially growing data. The field of exogenous data seems infinite. It’s up to us to get the best out of it- we’re just getting started.