Consider Adding Operations Research Capabilities to Your Data Science Team

Consider Adding Operations Research Capabilities to Your Data Science Team

A few years ago, we started scaling our data science capabilities in terms of personnel, tools and skill sets. We hired and developed a variety of data scientists with different backgrounds. Some came with hardcore statistics and applied math backgrounds, others with technical backgrounds in fields like engineering complemented with specialization in data science techniques, particularly in areas of programming scalable models.

Operations research and econometrics, I thought, would round out the team with more knowledge and collaboration possibilities. The vision of combining these diverse skills sets rests on the premise that highly trained analytical personnel will flourish in an environment of diverse backgrounds by mixing approaches and enabling creative problem-solving collaboration. An econometrician would introduce advanced statistical and forecasting capabilities, especially when looking at macroeconomic and time series predictive and causal analytics. This one remains on the back burner for us. Because we had a need for prescriptive capabilities of operations research, we did bring these capabilities on staff.

For those of you less familiar with operations research, the techniques in that discipline, which lend themselves perfectly well to the typical data science python programming and use of large data sets, are more focused on solving for a clear objective, given stated constraints. Operations research techniques provide prescriptive answers to analytical problems. This means that instead of predicting a good outcome, it provides the optimal solution given the data and constraints. Constraints are not really taken into consideration in ML models. These problems can usually be defined as having one objective or multiple hierarchical objectives, rather than gradations of correctness or deeply calculated predictions that you might determine when feeding data into a tree-based algorithm, for example.

For the most part, prescriptive analytics and predictive analytics have different use cases, so based on what your problems sets are you may lean into one area over another. For example, in a manufacturing business, prescriptive operations research techniques are prevalent in supply chain problems. At the same time, a manufacturing business could use predictive analytics of random forest models to, say, predict what consumer personas it should target with its marketing dollars. Even though we are a financial services business, we have found uses cases for both techniques.

Here's an example of how the two methods differ when looking at the same problem.

Supposed you wanted to find the most least cost path (time and money) to drive from point A to point B. Over the years, point A and point B have been traveled between many times, using many forms of transportation and you have large data sets associated with the cost and time associated with each trip including associated variables such as time of day, weather, fuel costs, vehicle used, etc.

If you trained a tree algorithm with this data, it would predict the path between A and B (based on previous training data fed into it) that is most likely to satisfy the objective of least cost or highest efficiency. This means that the tree-based predictive model will try to mimic the past performances in the most optimal way. A typical predictive output might be a scored list of trip routes ranked by cost (time and money). The highest scored route would be the one it predicts to be the lowest cost. You could run the model for a time of day, day of week and in certain weather conditions by retraining it only for those instances and have different lists for different occasions.

An operations research model would work differently because it does not predict the right answer based on historical data, it prescriptively tells you the least cost path between A and B. Said another way, this model would be asked to provide the single correct solution to the trip based on constraints you give it and the solution might even be a path that was never taken historically. So, you would provide all of the inputs and way points associated with the trip. You could even give it probability driven inputs like the weather forecast for the specific trip date. This model would then solve for the single route that is most efficient, given constraints, after comparing all the possible choices of routes. It could even tell you what time of day you should leave for the trip in order to achieve the most efficient outcome. You could also have different models for different circumstances and each one have the best route prescriptively calculated for you so that if it was snowing, for example, you would already know how to get to point B. The predictive tree model would instead give you the route it predicts is the least cost when snowing based on historical evidence.

So, the two approaches might arrive at similar conclusions for similar circumstances, but they would arrive there differently. The tree approach would use a probabilistic approach to say this is likely the best route based on learning from all previous experiences. The operations research approach would say this is the best route because it looked at all possible choices and found the best one.

The tree algorithm deals with uncertainty better, so if you want choices or don’t know for sure how the trip circumstances are going to unfold, then the tree-based results will be helpful. On the other hand, if you know the inputs are highly certain and the circumstances are locked in but the combinations are too complex to figure out, then the operations research model will give you a reliable trip path to take.

The insight for us is that all of these skills and capabilities are data science and by bringing them under one umbrella we enable a more productive team environment with different questions posed to lead to more elegant and useful solutions. We look forward to adding an econometrician to our team in the future to further enhance our innovative problem solving.

Rado Petsov

GTM | Winemaker

1 年

The article raises an interesting point about strengthening analytical work through integrating operations research skills into data science teams. Combining such complementary approaches could maximize insightful outcomes. Thanks for sharing this, Salvatore Tirabassi.

回复

要查看或添加评论,请登录

Salvatore Tirabassi的更多文章

社区洞察

其他会员也浏览了