Playing Referee
PC: Pxfuel

Playing Referee

Choosing Tools

One of the best parts about working in the data analytics space is the variety of tools we can choose from when working on a project. However, tools like Excel, Power BI, Python, and many more are exactly that; tools that we can potentially use to solve a business problem. Very rarely is there only a single tool that can solve a problem. Every tool, whether it's an application (Power BI, Excel) or a programming language (Python, R), has its own unique strengths and weaknesses. Sometimes a business solution can involve starting in one tool, expanding it into another, then translating it into yet another tool (and so on). We all make choices on what tools we want to use to better understand or solve a problem based on factors like:

  • Our own knowledge and capabilities for using different tools
  • Available tools for us to use, whether that's as an individual or in an organization
  • Tools that others within our team are using
  • What tools those that came before us built out existing processes in

Flexibility is Key

My two most recent published courses in the LinkedIn Learning library focus on building clustering, anomaly detection, and regression algorithms in three different tools: Excel, R, and Power BI. When I was developing these courses, I subconsciously chose not to get into a debate about which tools to use, but rather show how we can use each of them in different ways to understand a model, find a solution, replicate this process, and scale out the results to a wider audience.

No alt text provided for this image

These approaches can vary by individual as well. Each one of us understands and interacts with each tool differently. As a result, we all have our own takes on why we want to use a particular application or language over another one. My own approach for choosing between Excel, R, and Power BI for my own modeling goes a bit like this:

  1. Excel: How do the pieces of the model fit together and interact with one another?
  2. R: The data set is getting larger, and it might be easier to write a script that can do the work instead of using Excel formulas and the Solver add-in to get a solution?
  3. Power BI: Is there a need to scale out a model (including the ones built in R) in an existing business intelligence platform that also makes it easy to automatically refresh the data?

In other words, I embrace all three tools (plus many more like Python and AWS) but I figure out which one works best for the scenario or stage of the modeling process that I'm currently working on.

Excel: A Business World Mainstay Since 1987

In the tech world, we sometimes consider products dated if they are over ten or even five years old. So, the fact that Excel is over thirty years old and still such a ubiquitous part of the business world is absolutely incredible.

In one of last week's Career Chat series here on LinkedIn, Shyvee Shi interviewed Jonathan Rochelle (JR) about having careers at large and small companies. By itself, this is an amazing interview to listen to, but my favorite part is when Shyvee asks JR why Google Sheets become such a widespread tool for business collaboration (his startup company invented the Sheets technology that Google later bought). He said that Excel is one of the greatest business tools ever built, but its position as part of the personal productivity suite implies it's a tool intended for one person to use at a time. They designed Google Sheets, on the other hand, as a product that's meant to be shared.

And that's such a great point that I'm still thinking about a week later! I think of Excel as a sort of sketchpad or sandbox for working with data. It lets me focus my energy on understanding how a model works because it gives me a lot of flexibility for doing ad hoc calculations without having to initially worry about some of the challenges we often see in writing code in programming languages.

No alt text provided for this image

However, as I note in this diagram, Excel is also difficult to share because it isn't designed that way. There are also relatively low size limits for the number of rows in and Excel formulas are often easy to override without realizing it.

Power BI, on the other hand, takes a bit of time to build out with modeling and DAX calculations for example, but it's designed to be shared with others. Azure can store a lot more data than Power BI can, but it's also a cloud database to be built out with, so it has different role in the data analytics space than a tool like Power BI, which is more user facing.

Excel Solver: A Hidden Gem

The built-in Solver add-in is a hidden gem within Excel! As I mentioned earlier, I love that Excel can become a sketchpad or sandbox for playing with data models to learn more about them. I took linear optimization back when I was in college. The course concepts were quite fascinating. For example, airlines (I think that United was the specific example given) run these types of optimization models with millions of variables in them to determine how to schedule aircraft, cabin crews, flights crews, and gates.

While the topics were really interesting, I didn't really care much for the approaches the course took for solving these types of models until I learned how to solve them in Excel! While Excel's Solver supports up to 200 variables instead of millions, it's a great way to learn how to set up linear and nonlinear models on a smaller scale.

Even for a one-dimensional model in Solver, I think about the same three components as I set it up:

  1. Input variables that change values to find a model solution (yellow)
  2. Objective to optimize (blue)
  3. Constraints that limit potential model solutions (orange)

No alt text provided for this image

A great example of using Solver for a one-dimensional model is if we want to determine the internal rate of return (IRR) for a project. We can also use Solver to determine parameters for algorithms like those we see in statistics and machine learning.

Here's an example of how it determines the coefficients for a logistic regression model. This particular video focuses on running the Solver, but the videos before that in the course show how to set up the formulas in Excel before we run it. From there we can use these coefficients to make predictions and create data visualizations for these models. I also show a more concise way to run the model in R, but Excel really lets us see how the puzzle pieces for logistic regression models fit together.

We can also use Excel Solver to fit an orthogonal line as the first part of the PCA (Principal Components Analysis) model. This video in my most recently published data reduction course in the LinkedIn Learning library, like the logistic regression example above, only shows how to run Solver. The videos before that though show how to set up the formulas and calculations that go into solving the model. In the videos that follow for the rest of this chapter in the course it goes through how to rotate and scale the PCA space we project our existing data points onto in the model by calculating the eigenvectors and eigenvalues in Excel and R.

I can't believe it's the last week of March already! That means there's still over 75% of the year left to learn and explore new ideas, along with keeping up with everything else. I was talking to someone today about learning the 14 new functions that just came out in Excel. I'm excited to try them out soon myself!

-HW

Jonathan Rochelle

Product Leader & Builder, Entrepreneur, Startup Advisor, Investor, Creator, Learner. Intensely interested in Music technology and production

2 年

Love this article! “Excel is also difficult to share because it isn't designed that way. “ - this was the problem we focused on with Google Sheets, but even earlier with the predecessor product, “XL2Web”, which allowed excel models to be published as interactive web apps (circa 2002-2003)

Jonathan Rochelle

Product Leader & Builder, Entrepreneur, Startup Advisor, Investor, Creator, Learner. Intensely interested in Music technology and production

2 年

Thank you for the shout out Helen Wall ! I also missed this post first time around! Great article!

Shyvee Shi

Product @ Microsoft | A forward-thinking product leader combining creativity, user psychology, and AI to drive growth and scale communities | ex-LinkedIn

2 年

Can’t believe I missed this article earlier! Great article! Always prioritize choosing the right tool for you vs what others said you should choose. Thanks again for the write-up and glad that we inspired your thoughts on this!

Paul Collis

Heads up a team of Data Ninjas!

2 年

Great post!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了