What is hiding in your data today?

What is hiding in your data today?

This is a brief discussion on the importance of effective Data Analytics to business decisions on a day to day basis. All information that could potentially identify an organization or operational details has been removed.

Background

A colleague of mine a consultant that works in supply chain and provides services to optimize virtually any aspect of the supply chain. One premise they hold to be a global constant is that once a warehouse exceeds 85% of available physical space, productivity begins to fall dramatically. In this scenario, that 85% threshold is referred to as capacity for a warehouse. So, 100% capacity would be equal to 85% of the physical space. There are multiple reasons for the drop in productivity, but those are not the subject here. However, during a broader discussion, I offered to attempt to quantify this drop in productivity. 

The data that was made available was reasonably comprehensive, although not as detailed as I would have preferred. But very accurate records were made available that held data for time spent on inbound and outbound activities (labor) within the warehouse, as well as inbound orders and receiving information from suppliers and outbound orders and shipping information to customers—lastly, an accurate daily count of inventory on hand and the capacity of the warehouse under review. 

As is always the case, a fair amount of time was involved in data preparation and cleansing. A typical step in that process was univariate statistics for each of the 27 or so variables or fields in the data. This will be important later. Once the data was prepared, several different processes were utilized to identify statistically relevant fields, data, and patterns. In the end, I chose the field “ship******”, which is the sum of goods (cases or pallets) shipped divided by the total minutes of labor to ship them, as the measure of throughput. And I went with multivariate linear regression as the modeling technique and used stepwise selection on a subset of the variables to identify the best performing model 

The result was a set of graphs and supporting data that showed a significant decline in throughput as the percentage of capacity exceeded 100%, and the corresponding logarithmic increase in costs (the math geek in me would enjoy a discussion on how a linear decrease in throughput causes a logarithmic increase in cost, but again, not the point of this discussion). The data, graphs, and supporting write-up were delivered, and everyone was satisfied. But during the write-up, something jumped out at me that I found fascinating. 

The Value of Analytics

What jumped out as I was reviewing the univariate statistics for the dataset, was the throughput measure mentioned above. The average of ship****** for the full year of data was 4.605. What popped out at me as odd, was the average shipped on days when the warehouse was operating at under 100% of capacity was 4.540 (lower throughput), and the average for the days the warehouse was operating at over 100% capacity was 4.813 cube per minute (higher throughput).

No alt text provided for this image

This data would be relatively easy to query in the database, and the comparison would seem obvious. A rational person would see that throughput was higher when the warehouse was over 100% capacity and conclude being over capacity to be the optimum arrangement. Leading to decisions about how much inventory to order and stock, how many day’s supply to keep on hand, what manufacturer deals were worth taking advantage of, and a host of ripple effects. But remember, the regression shows a sharp decline in throughput over 100% capacity! How can these two pieces of information be accurate and coexist? 

What causes this oddity is the way averages work. Because the warehouse in question is actually very efficient, they tend to have the majority of the days at right around the 100% capacity benchmark, but avoid going too far over. So, the number of days at 99%, 100%, and 101% is roughly equal. As the capacity increases or decreases, the throughput falls. But there are 3X the number of days below 99% as there are over 101%. Meaning the average for days under 100% is pulled farther down than the average for days over 100% utilization, leading to exactly the observed outcome. 

If instead of 2 groups (over and under 100%) we took the average of three groups (<90%, 90% to 101%, and >101% for instance) the anomaly would disappear, and we would have lower throughput on the ends and a peak throughput in the middle. And THAT is why businesses and the C-Suite need effective Data Analytics, it compares the data at every point across the spectrum, striping away (or at least identifying) the interaction of other variables, and shows the real meaning in the data. And it is rarely as simple as dividing into three groups instead of two…..

要查看或添加评论,请登录

Bret Conard, MS, MBA的更多文章

  • Business Intelligence versus Data Analytics

    Business Intelligence versus Data Analytics

    What is the difference between the two and what are their uses and end goals? BI (Business Intelligence) is the…

  • Value of Vendor vs. Partner.

    Value of Vendor vs. Partner.

    A few years ago, we were able to realize my wife’s long-term dream of installing a pool. She enjoys the sun and…

  • When a mistake goes from inconvenient to catastrophic

    When a mistake goes from inconvenient to catastrophic

    I was sent the article in the first comment by a colleague in the Supply Chain industry. I'm absolutely flabbergasted.

    1 条评论
  • You Have Choices. But Reality is Still Reality. Part 2

    You Have Choices. But Reality is Still Reality. Part 2

    In the first installment on this topic, we discuss the adage “…it can be Good, Fast or Cheap, pick 2”, and why the…

  • You Have Choices. But Reality is Still Reality. Part 1

    You Have Choices. But Reality is Still Reality. Part 1

    How many times has the has the phrase “It can be good, it can be fast, or it can be cheap, pick two” been said?…

  • The Pizza Jar pt2

    The Pizza Jar pt2

    A few months ago, I posted an article about the Pizza Jar on my desk. (original here).

    1 条评论
  • Stunning Results

    Stunning Results

    Several months ago, during a refresh of our brand and communication templates, we updated our corporate email…

    1 条评论
  • The Pizza Jar

    The Pizza Jar

    This got me thinking: https://www.cnbc.

  • Network Security Is An Application Layer Issue pt 3

    Network Security Is An Application Layer Issue pt 3

    Part 3 of a 3 part series In Part 1 and Part 2 of this series, we discussed the impact and risks associated with some…

    1 条评论
  • Network Security Is An Application Layer Issue pt 2

    Network Security Is An Application Layer Issue pt 2

    Part 2 of a 3 part series Continuing on the theme from Part 1. On another occasion, in an attempt to overcome what can…

社区洞察

其他会员也浏览了