登录查看更多内容

Operational Research: important tools for data scientists

Uri Weiss

Data Science | Mentor | Ex-Google

发布日期: 2016年12月27日

When you read about data scientist the focus is mostly on prediction of values (i.e., extrapolating or interpolating values of an unknown function). However, almost every important data science project in Agoda contains some big constraint optimization problem, which needs to be solved in a big data environment. In the academia these type of problems are studied under the discipline of Operational Research (OR) so one (naively?) would expected to see OR topics mentioned in data science focused CVs; alas, among the hundreds resumes I reviewed in the last year only a handful mentioned optimization skills explicitly. Why? I'm not sure really but here is a curriculum I pulled from the Galvanize data science program:

Week 1 - Exploratory Data Analysis and Software Engineering Best Practices
Week 2 - Statistical Inference, Bayesian Methods, A/B Testing, Multi-Armed Bandit
Week 3 - Regression, Regularization, Gradient Descent
Week 4 - Supervised Machine Learning: Classification, Validation, Ensemble Methods
Week 5 - Clustering, Topic Modeling (NMF, LDA), NLP
Week 6 - Network Analysis, Matrix Factorization, and Time Series
Week 7 - Hadoop, Hive, and MapReduce
Week 8 - Data Visualization with D3.js, Data Products, and Fraud Detection Case Study
Weeks 9-10 - Capstone Projects
Week 12 - Onsite Interviews

The focus is on machine learning, big data tools, and some statistics - no OR/Optimization. The situation is not different in other data science curriculums (e.g., Insight and Metis).

My personal belief is that a well-rounded (senior) data scientist should have some OR skills; it is certainty something I often test for in my interviews. Following are my recommendations for useful OR tools (I've used them all throughout the years):

Lagrange multipliers and the more general Karush–Kuhn–Tucker conditions (which are mostly theoretical but serve as a good foundation to understand constraint optimization over smooth functions).
Linear programming and the much harder (as in NP-complete) integer programming.
Flow networks and the max-flow/min-cut algorithm (these can be reduced to linear programming but still an important concept).
Multi-objective optimization.

(re-posted from my new-ish blog; see link in my profile)

Alex Fleischer

7 年

And this is why CPLEX is part of DSX the IBM Data Science platform

4 次回应

Peter Cacioppi

Customized Analytics for Supply Chain and Optimization

8 年

I think part of the problem is that there aren't sufficient open-source libraries focused on the needs of optimization models. Optimization nearly always requires very clean data that conforms to a specific schema. This is different from ML, whose algorithms tend to work on tables of arbitrary schema and whose techniques often incorporate strategies for basic data cleaning. I've built the ticdat package and deployed it on pypi to address this need.

1 次回应

Vince O'Neill

Analytics Leader | Bridging the divide between business decisions and analytics

8 年

Over the course of my career, the most impactful solutions I've created all included some element of optimization. Like you I've been surprised that it hasn't really been part of the data science playbook

2 次回应

Nasir Hameed Khan,

SAP |Supply Chain | Data Analytics Professional

8 年

The article is by Uri Weiss ,correction.

Nasir Hameed Khan,

SAP |Supply Chain | Data Analytics Professional

8 年

Hi Bill you are right about Analytics Edge course add to that the MIT supply chain courses related to CTL.SC2x,using linear and Mixed integer linear programming for network design ,SOP and procurement. Also, the course CTL.SC0x has used MILPs and LP.Indeed this is a very good article by Prof.Watson.

查看更多评论

要查看或添加评论，请登录

Uri Weiss的更多文章

Explaining to 15 y/o what is AI

2018年6月7日

Explaining to 15 y/o what is AI

Today I went to my kids' school to present over Career's Day. It is an interesting exercise to try and explain a deep…

6 条评论
Data Science picture surprise...

2017年3月9日

Data Science picture surprise...

The picture I choose for this post may surprise you; can you tell why? Well, the reason is simple: it features a woman…

11 条评论
A/B testing for conversion rate, revisited, part 2

2017年2月17日

A/B testing for conversion rate, revisited, part 2

Part 1 addressed how to compute significance decision boundaries when A/B testing for changes in conversion rate (CR)…

6 条评论
A/B testing for conversion rate, revisited

2017年2月3日

A/B testing for conversion rate, revisited

A quick refresher: Conversion Rate (CR) is the proportion of users that performed an action (typically buy/book) once…

7 条评论
So, how many data scientists are out there at the end of 2016?

2016年12月6日

So, how many data scientists are out there at the end of 2016?

Building data science teams in Agoda often requires us to relocate candidates to Bangkok so we naturally look around…

8 条评论

See all articles

Operational Research: important tools for data scientists

Uri Weiss

Data Science | Mentor | Ex-Google

Uri Weiss的更多文章

社区洞察

其他会员也浏览了

Brief Guide On How To Become A Data Scientist

DATA SCIENCE

Data Science with a Newbie

S3: Episode 5: Classification Basics with Logistic Regression

The Data Science Table - A Metaphor

What skills do I need to be a data scientist

Path for successful Datascientist

How to become a Data Scientist 2023

Optimizing People Search With Data Science

Statistics, Programming, & Data Science

Uri Weiss的更多文章

Explaining to 15 y/o what is AI

Data Science picture surprise...

A/B testing for conversion rate, revisited, part 2

A/B testing for conversion rate, revisited

So, how many data scientists are out there at the end of 2016?

社区洞察

其他会员也浏览了

Brief Guide On How To Become A Data Scientist

DATA SCIENCE

Data Science with a Newbie

S3: Episode 5: Classification Basics with Logistic Regression

The Data Science Table - A Metaphor

What skills do I need to be a data scientist

Path for successful Datascientist

How to become a Data Scientist 2023

Optimizing People Search With Data Science

Statistics, Programming, & Data Science