"Behind the Screens" is a blog series featuring Stitch employees and the cool projects, product features and reports they’ve helped build. We want to bring a little life into the work we do everyday and give you a peek into how we do what we do!
Meet Smith, one of the masterminds behind Stitch’s new sales forecasting report. Smith is a data scientist at Stitch. He provides insights to the company on how we can improve the product and better understand our customers. I sat down with this data-loving scientist to learn more about why and how he built the forecasting report.
As a quick refresher - Stitch launched a brand new forecasting report to help our customers improve purchasing and increase revenue opportunities by better predicting demand for products.
Ok, let’s start simple. What makes this forecasting model cool?
It’s cool because it's a flexible model allowing us to factor in the customer’s historic data, which makes the model unique to each customer’s business. With this model, we can make strong predictions whether they’re a new business or established.
It’s also a learning model, which means it’s always accounting for new data and information from the most recent sales. It gets smarter and more accurate every night!
That sounds awesome. How did you come up with it?
It was an iterative process that built upon existing models, while sprinkling in some of Stitch’s own secret sauce based on insights we gathered by looking across our thousands of customers.
I started by looking at traditional demand forecasting methods including autoregressive and exponential smoothing models. These have been used in industry for decades, but we found they weren’t the best for our customers for several reasons, but a few are:
1. They can’t make use of all the information Stitch has available.
2. They make assumptions that sales volume is normally distributed. But for smaller businesses who may only sell a few dozen of a particular product in a given time period, sales follow a Poisson distribution. We also found that distribution of orders was often bimodal.
3. They didn’t account for future growth. And this was really important to us. We are all about helping our customers grow faster, so this had to be included.
So what model did you use?
After testing several other models, the one that worked the best was a random forest model. In prep for this interview, I even tried looking up easy ways of explaining it - no surprise - I couldn’t find any!
Basically, the model is a group of decisions trees, hence the name. Each decision tree is trained using a random subset of predictors associated with sales over a prior time period. When making its prediction, each decision tree examines a product’s most recent history and votes for what it predicts the sales will be. As a whole, the average vote of the forest decides the final prediction.
What makes it perfect for us is that this democratization of voting allows for the model to account for non linear relationships between future sales and historical predictors - such as product availability. At the same time, the random sampling of predictors allows us to use as much of a customer’s historical data as possible without biasing the model or making false assumptions about the product’s sales distribution.
It’s also a better fit for young companies that have little historical data, but will grow with our customers, evolving as their companies grows into something bigger.
What was the biggest challenge?
Well, to have the best forecast you want to include all the information that is available. In our case, we wanted to always use the most up-to-date order information. The challenge came in figuring out how frequently we could pull data and generate the reports without slowing down the customer’s in app experience. It took real engineering finesse - but we did it! We can pull new information into the model every night.
What data is being used in the forecasting report?
Short answer - ALL of it! But it’s actually more complicated than that. When we create the initial model for each company’s business we use all the historic data available to us. However, when we go to create the forecast for a selected period, we use the order data for each of the variants from the last 10 to 12 weeks to inform the predictions.
You’re clearly very passionate about this, what’s the future of forecasting in Stitch?
Oh, this is just the start! The goal is to include additional information from your marketing campaigns, web traffic, regional supply, different channel activities, etc., to refine the forecasts. On that topic, we just opened our beta for Google Analytics. That is the first step toward getting Google Analytics data for product listings into the forecast. It’s going to be awesome.