Cleaning data in Power BI
Andreia da Concei??o Matias
Data Analyst | BI Consultant | Dashboard Developer | SQL | Python | Power BI | Assisting companies in performing through data analysis
In this article, I explain how I cleaned data in order to create a simple dashboard in Power BI.
The datasets are available at https://www.kaggle.com/datasets/maartenvandevelde/marine-litter-watch-19502021:
Even though the page in Kaggle states that it contains data from litter on beaches across Europe, there are data from other places as well.
First, I created a new column to change the event date from text into date:
Here is the final column:
Analyzing the other columns from the table, I realized that there were some inconsistencies:
These rows were kept because it was possible to identify the country and the regional sea. The question marks were replaced by “Unknown”.
There were some question marks in parts of the location names when it had an accent. Those locations were researched, and the values were corrected.
The BeachCountryCode column had an inconsistent code “Int”.
As it was possible to identify the countries in the BeachName column, a new column was created to replace or fill in the country code:
After cleaning data, I duplicated the table to separate the event data from the garbage collected and excluded the unnecessary columns of each new table.?
In the Event table, I removed the duplicates.
I also included a table to show the name of the countries instead of its code. It is available at https://datahub.io/core/country-list#resource-country-list_zip
In the garbage collected table, as each item is a column, I turned those columns into rows.
As I finished cleaning data, I created the visuals and here is the final version of the dashboard: