登录查看更多内容

Data Analytics Literacy, a Lecture for Brazilian workers at a Truck manufacturer company

Celso Poderoso

Director, Analytics | Business Intelligence - BI Expert | Data Products | Telecom, Retail & Consulting Services Specialist | AI Enthusiast

发布日期: 2022年5月29日

I was invited to talk about Data Analytics Literacy to a group of Logistics workers at Scania Brazil, a major Swedish manufacturer focusing on heavy commercial vehicles. This lecture series aims to explain and motivate workers to adhere to new technologies. Analytics is not unique to the audience, but because they are literate in creating reports and dashboards using Power BI, this lecture would be an excellent opportunity to show them the next steps.

Data Literacy is a comprehensive concept that includes reading, understanding, building, and communicating data as information. It is definitely not only Analytics because it involves the ability and competencies to work with data. Analytics can improve decision-making and be part of a literacy program. Analytics is used to answer business questions progressively and is key to anyone who pursues data literacy.

There are four stages of business questions that can be enabled by Analytics: descriptive, diagnostic, predictive, and prescriptive.

The first two are entirely based on past data to explain what happened and why it happened. The last two are focused on the future. They need historical (past) data to identify a pattern, predict what probably will happen, and prescribe how to act to avoid undesirable situations or recommend actions to reinforce desirable results.

With that in mind, I built a simple Power BI and Python notebook to demonstrate how easy it is to put this concept into practice. My goal was to prove that it is not rocket science and does not require fancy Artificial Intelligence knowledge. They (and you) can start this literacy right now with the data you are currently using in any report or dashboard.

My first step was to find a dataset that would make sense for Logistics workers. And I found this at Kaggle. With minimal effort, I built the Power BI page to explain what happened, the first stage of Analytics literacy.

Based on the analysis above, I found that the number of late transportations is pretty much high, and it would require further investigation. For this particular goal, or why it happened, I built a new visualization.

Based on this analysis, we can get some insights regarding Destinations, Origins, Suppliers, and Customers getting the most significant number of late deliveries. I could decide to avoid some destinations, work with customers to understand if the problem would be on their side, or even change the origins to see if the problem is on my Distribution Center locations.

This is what can be done when you are looking at past data. The idea is that we can go a step further and put a statistical algorithm to help us predict when a delivery will be late or what will happen. At this moment, it is crucial to have a business question in mind. My business question is, what trips have a higher probability of experiencing delays, and how can I avoid this from happening.

After I imported the spreadsheet and made some fundamental transformations, I had a dataset to start building my predictive model.

With fundamental concepts on statistics, like having part of the dataset to train the statistical model and another piece to test the model accuracy, and understanding that, based on your business problem, you will use a group of candidate algorithms, I built five predictive models to see which one could be used to tell me if one specific trip would be late or not. For this particular goal, I used Classification algorithms because the answer will be yes, one or no, zero.

I used Logistic Regression, Decision Tree, Random Forest, Gradient Boosting Models, and XGBoost algorithms. As you can see above, it is easy to import the algorithm and fit the model with the training data (Xtrain and ytrain). Then I test the model by predicting the test data (Xtest) and comparing it to the actual results (ytest). In the example above, XGBoost made correct predictions in 88% of the cases. Because I will use this model to predict new results, I re-trained the model using all available data (train and test), performed the prediction with all data (X), and compared it to accurate results (y). Naturally, the return is better, and now I got almost 93% of correct predictions. I added one column for each algorithm and saved them into a new spreadsheet to analyze and compare the algorithm performances.

As you can see, all algorithms had excellent performance. From now on, I can select one of these algorithms and start the prediction journey of what will happen.

The last stage of Analytics will use the late trip predictions to recommend alternatives to avoid it. Recommender systems can be very complex. My goal is not to explain how it works extensively, but it is easy to understand when comparing it to other recommendations we receive and probably even notice. If you are a Netflix subscriber, you see some recommended films or series. If you are an e-commerce user on all platforms, you receive recommendations. That's what we will do now, but naturally, limiting the scope of this recommendation.

I needed an algorithm to rank the possibility of getting different outcomes when the prediction indicates that the trip would be late. I selected the Nearest Neighbors algorithm. It is simple, straight, and does not require much computational power to give good alternatives. Perfect for my goal to show you how easy it is.

First, I built some fake data based on random numbers. It is a disaster for our tests, but I do not have any new data to use or compare. Remember that by using random numbers, I can get combinations of Origins, Destinations, Suppliers, and Customers that would never happen. But even in a not-so-good environment, I hope the algorithm will give me good insights.

As you can see above, the Decision Tree prediction gave some rows with late deliveries represented by 1 in Predict column. Now I will prepare a matrix with the available data. I am far from saying that I used the best approach. But remember, my goal is only to show you it is not rocket science. Basic statistics concepts will do the trick. I built a matrix with Origins and Destination (I assume the root cause of late deliveries can be explained by those columns) and set the distance as a recommendation measure. Returning to the Netflix example, this measure would be the number of likes on each film by each user (movie and user for Netflix are the origin and destination of this model).

After building the matrix, I applied the Nearest Neighbors algorithm and asked the model to return the nearest neighbor for the predictions flagged as late.

As you can see below, it was not perfect. But it changed 50% of the predictions! If this is true, and I am not telling this is, since we are working with a public dataset and random data, who is the human being that would change the trip origin and avoid 50% of the late trips?

This last analysis shows that it is easy to get a recommendation to avoid undesirable outcomes. And to me, I expect to show you that you can start using Analytics to its full potential. Do you need some help with statistics, Python, or self-service BI? It is precisely the Data Literacy program's goal. Understanding people's needs, not the technology, will make us comfortable using our data to unlock possibilities, improve business results, and transform the organization's Data Culture into a Data-Driven organization.

You can find my Phyton notebook and Power BI here.

要查看或添加评论，请登录

Celso Poderoso的更多文章

Modern Data Stack - using Google AppSheet, Airflow, DBT, Google Big Query, and Looker Studio

2023年3月28日

Modern Data Stack - using Google AppSheet, Airflow, DBT, Google Big Query, and Looker Studio

Data Lakehouse architecture allows Data Professionals to get the best from the Data Lake and the Data Warehouse. The…

7 条评论
People Analytics – Empowering People and Companies with Data

2022年9月14日

People Analytics – Empowering People and Companies with Data

With Gabriela Chaves and Fernando Arbache These meeting minutes are extracted from Data On Air Podcast and reflect the…
Modern Data Stack – Unlocking the Analytics and Data Science Constraints

2022年8月30日

Modern Data Stack – Unlocking the Analytics and Data Science Constraints

With Fabiane Bizinella Nardon e Lucas Demitroff Brandi These meeting minutes are extracted from Data On Air Podcast…
Data On Air #19 - Brazilian Drugstore / Pharmacy Data Literacy Journey

2022年5月28日

Data On Air #19 - Brazilian Drugstore / Pharmacy Data Literacy Journey

With Sergio Santos These meeting minutes are extracted from Data On Air Podcast and reflect the participants’ thoughts…
Data On Air - Episode #18 - Data Literacy

2022年5月15日

Data On Air - Episode #18 - Data Literacy

With Danilo Vasconcelos e Melqui de Carvalho These meeting minutes are extracted from Data On Air Podcast and reflect…
Data On Air #17- Meeting Minutes: Recommender or Recommendation Systems

2022年5月1日

Data On Air #17- Meeting Minutes: Recommender or Recommendation Systems

With Rejane Santos and Rodolfo Leite These meeting minutes are extracted from Data On Air Podcast and reflect the…
Data On Air – Episode #15 Meeting Minutes: Data, Marketing, and Social Media

2022年4月2日

Data On Air – Episode #15 Meeting Minutes: Data, Marketing, and Social Media

With Fabiano Cruz and Luna Gutierres These meeting minutes are extracted from Data On Air Podcast and reflect the…
Data On Air – Episode #13 Data Scientist Professional Meeting Minutes

2022年3月19日

Data On Air – Episode #13 Data Scientist Professional Meeting Minutes

With Adelaide Alves & Matheus Pavani These meeting minutes are extracted from Data On Air Podcast and reflect the…
Data On Air - Episode 12 - Data Mesh Meeting Minutes

2022年3月12日

Data On Air - Episode 12 - Data Mesh Meeting Minutes

With Diogenes (Dio) Santo & Leandro Mendes This Meeting Minutes is exerted from Data On Air Podcast and reflect the…
Data On Air – Episode #11 – Data Lakehouse – Meeting Minutes

2022年3月5日

Data On Air – Episode #11 – Data Lakehouse – Meeting Minutes

With Anderson Paulucci and Tassiana Rugoni These meeting minutes are exerted from Data On Air Podcast and reflect the…

See all articles

Data Analytics Literacy, a Lecture for Brazilian workers at a Truck manufacturer company

Celso Poderoso

Director, Analytics | Business Intelligence - BI Expert | Data Products | Telecom, Retail & Consulting Services Specialist | AI Enthusiast

Celso Poderoso的更多文章

社区洞察

其他会员也浏览了

Data Science vs. Data Analytics: Understanding the Key Differences

Data-Driven Decision making - Enhancing Your Skills in Data Analysis

Defining Data Literacy

Why you should educate non-tech employees in data skills

Basic Mathematics for Data Analysis

Lessons Learned from Top Data Professionals

Learning Analytics Series: Glossary of Terms Beginning with "Data _____"

Data Analytics Training with ONLEI Technologies: A Pathway to Success

Unlocking the Power of Data with a Comprehensive Data Analytics Course

Boost Your Team's Data Literacy: The Power of Data Skills in Today's World

Celso Poderoso的更多文章

Modern Data Stack - using Google AppSheet, Airflow, DBT, Google Big Query, and Looker Studio

People Analytics – Empowering People and Companies with Data

Modern Data Stack – Unlocking the Analytics and Data Science Constraints

Data On Air #19 - Brazilian Drugstore / Pharmacy Data Literacy Journey

Data On Air - Episode #18 - Data Literacy

Data On Air #17- Meeting Minutes: Recommender or Recommendation Systems

Data On Air – Episode #15 Meeting Minutes: Data, Marketing, and Social Media

Data On Air – Episode #13 Data Scientist Professional Meeting Minutes

Data On Air - Episode 12 - Data Mesh Meeting Minutes

Data On Air – Episode #11 – Data Lakehouse – Meeting Minutes

社区洞察

其他会员也浏览了

Data Science vs. Data Analytics: Understanding the Key Differences

Data-Driven Decision making - Enhancing Your Skills in Data Analysis

Defining Data Literacy

Why you should educate non-tech employees in data skills

Basic Mathematics for Data Analysis

Lessons Learned from Top Data Professionals

Learning Analytics Series: Glossary of Terms Beginning with "Data _____"

Data Analytics Training with ONLEI Technologies: A Pathway to Success

Unlocking the Power of Data with a Comprehensive Data Analytics Course

Boost Your Team's Data Literacy: The Power of Data Skills in Today's World