登录查看更多内容

Visualizing Nullity: How to Uncover Gaps in Your Marketing Data With Python

Bj?rn Thomsen

Marketing Lead at meshcloud.io | Accelerating B2B Market Growth | Professional in Performance Marketing & Web Analytics

发布日期: 2024年2月18日

Nullity is a significant issue in the analysis of marketing data, particularly when integrating performance metrics from various marketing channels and ad networks. This often results in gaps or missing values, which can distort clean attribution of the contribution of a marketing channel or ad network, or even the entire statistical analysis.

Dealing with these gaps in the context of data wrangling involves various methods as per statistical theory. Common approaches include dropping rows with missing metrics, setting them to "null," or imputing them with a measure such as the mean, median, or arithmetic/logarithmic mean, depending on the context. This topic delves into its own field of study, which we won't explore further in this article.

Our focus in this article is solely on visualizing nullity to easily identify the extent of missing data and any resulting interdependencies. For this purpose, we utilize the Python library "missingno" available at: https://github.com/ResidentMario/missingno . This tool allows marketers, even those without in-depth knowledge of data analysis, to gain a visual understanding of the nullity within their data.

By the way: Not only 0, null, "", and NaN pose problems in data interpretation. Outliers such as min and max values often distort the overall picture as well. A single massively underperforming ad with an extremely high CPA can skew the entire performance review. Therefore, always be on the lookout for outliers and gaps.

Installing missingno

The first step is to locally open a new Jupyter Notebook. You can find how to install and run a Jupyter Notebook here: https://jupyter.org/install

We also need the missingno library for Python. We can install it using the following command line in our console:

pip install missingno

Now we open a new Jupyter Notebook and import our marketing performance data, for which I will use dummy values in CSV format for Google Ads, TikTok Ads, and LinkedIn Ads, as a DataFrame. You can replace the file path with your own CSV file. By the way, missingno makes use of Matplotlib, as we can see in the code.

import pandas as pd 
import missingno as msno 
df = pd.read_csv(r"C:\Users\Bj?rnThomsen\Desktop\your-csv-file.csv", delimiter=',', skiprows=0, low_memory=False) 
%matplotlib inline

The Nullity Matrix

Using the matrix() method, we produce our initial visualization, portraying the magnitude of nullity within the dataset. White lacunae signify absent dimensions. The sparkline on the right encapsulates the overall profile of data completeness, identifying rows with the maximum and minimum nullity in the dataset.

msno.matrix(df.sample(35))

领英推荐

Analyzing Excel Sales Data with Python Pandas and…

Eduardo Miranda 4 个月前

The Nullity Dendrogram

Next, we generate a dendrogram with the dendrogram() method to illustrate the hierarchical clustering of nullity correlations between values. While this may sound complex, it simply means that we represent pairs of metrics or clusters in a hierarchical tree, whose values or nullity are particularly strongly correlated.

Due to the lack of performance data in our Linkedin Remarketing Campaigns in Asia, as well as missing region codes for our Google Ads Campaigns, comparability is obviously compromised.

msno.dendrogram(df)

The dendrogram method in missingno creates a correlation tree for nullity.

The Nullity Correlation Heatmap

Lastly, I would like to create a nullity correlation heatmap using the heatmap() method. This is not a conventional heatmap that simply calculates the Pearson product-moment correlation coefficient of the values contained within. Instead, it focuses solely on the correlation of presence or absence of nullity.

This operates as follows: Nullity correlation ranges from -1 (indicating that if one variable appears, the other definitely does not) to 0 (suggesting that the appearance or absence of variables has no effect on each other) to 1 (indicating that if one variable appears, the other definitely does as well).

msno.heatmap(df)

Not the typical correlation heatmap you know: This one focuses solely on the correlation of nullity.

Conclusion

From a personal perspective, entries with missing data should be regarded with significant skepticism in the overall analysis. Addressing nullity through various statistical imputation techniques is essential for maintaining data integrity. For an initial visual assessment of the extent of nullity, especially with limited data analysis resources at hand, missingno in Python stands out as the preferred choice for an initial assessment

Merged marketing data in particular, such as performance values from various marketing channels and ad networks, tend to have gaps. This is due to the nature of the matter: Video performance produces different metrics than email marketing or SEO.

By the way, rarely have I found an article to come together so effortlessly. With missingno, visualizing nullity is easy as pie. ??

要查看或添加评论，请登录

Bj?rn Thomsen的更多文章

Visualizing Marketing Geo Data with PyGWalker: As Simple as Mapping Meteorite Strikes

2024年11月16日

Visualizing Marketing Geo Data with PyGWalker: As Simple as Mapping Meteorite Strikes

Mapping geospatial data, such as website traffic or sales by location, doesn’t necessarily require expensive tools like…
Unveiling Hidden Patterns in Performance Marketing with HoloViews' Violin Plots and Radial Heatmaps

2024年9月27日

Unveiling Hidden Patterns in Performance Marketing with HoloViews' Violin Plots and Radial Heatmaps

When marketers and web analysts want to represent complex distributions and relationships between values, they…
Strategic Website Planning: Using Python and NetworkX to Visualize & Compare Sitemap Trees

2024年8月23日

Strategic Website Planning: Using Python and NetworkX to Visualize & Compare Sitemap Trees

Website maintenance, frequent updates, or even a partial relaunch can sometimes lead to a drift from the original…

1 条评论
From Practical to Playful: How to Animate E-commerce Data & Website Logs with Python and Matplotlib

2024年6月27日

From Practical to Playful: How to Animate E-commerce Data & Website Logs with Python and Matplotlib

Sometimes, it's not enough to simply track and tabulate Ecommerce or website KPIs. Marketers often face the challenge…
You're Doing SEO Well! But Did You Know That You Can Conduct Keyword Research Using Python Pytrends & Data Visualization?

2024年5月8日

You're Doing SEO Well! But Did You Know That You Can Conduct Keyword Research Using Python Pytrends & Data Visualization?

New marketing campaigns, which require tailored content on the website, often necessitate intensive SEO research. Tools…

1 条评论
3 Implementation Ideas: Enhancing Google Ads with PandasAI, OpenWeather, and IDEFICS

2024年4月26日

3 Implementation Ideas: Enhancing Google Ads with PandasAI, OpenWeather, and IDEFICS

Paid campaigns are becoming increasingly expensive. Especially in Google Search and Google Display Network, click…
Restricting Tag Deployment in Google Tag Manager

2024年3月29日

Restricting Tag Deployment in Google Tag Manager

There are plenty of reasons to put the brakes on deploying tags, triggers, and variables in your Google TagManager…
Enhancing Marketing Attribution Through Predictive Visualization With Markov Chain

2024年3月11日

Enhancing Marketing Attribution Through Predictive Visualization With Markov Chain

For those engaged in high-level multi-channel marketing, the goal is to extract the absolute maximum from their…
Visualizing Marketing Performance Data With the Tableau-like PyGWalker Library

2024年2月12日

Visualizing Marketing Performance Data With the Tableau-like PyGWalker Library

In this article, I'll illustrate how marketing teams can effectively analyze and visualize performance data using…

2 条评论
Don't Ghost it: Implementing the Consent Mode with Google Tag Manager (GTM)

2024年2月3日

Don't Ghost it: Implementing the Consent Mode with Google Tag Manager (GTM)

To ensure the continued efficiency of Google Ads campaigns in the European Economic Area (EEA) from March 2024 onwards,…

See all articles

Visualizing Nullity: How to Uncover Gaps in Your Marketing Data With Python

Bj?rn Thomsen

Marketing Lead at meshcloud.io | Accelerating B2B Market Growth | Professional in Performance Marketing & Web Analytics

Installing missingno

The Nullity Matrix

领英推荐

The Nullity Dendrogram

The Nullity Correlation Heatmap

Conclusion

Bj?rn Thomsen的更多文章

社区洞察

其他会员也浏览了

Python eats away at R: Top Software for Analytics, Data Science, Machine Learning in 2018 -Trends and Analysis

Matplotlib

Analyzing Excel Sales Data with Python Pandas and Seaborn - Part II

Guide to Churn Prediction : Part 5— Graphical analysis

Free STATA 18 with a valid license key through a completely ethical approach - no cracks or portable versions

SQL Challenge: Number Of Custom Email Labels

Essential Tools for Aspiring Data Scientists: Your Path to Success

Integrating PyCaret time-series module into Power BI

Z-Order: Visualization and Implementation

Creating a Machine Learning App with Power BI and PyCaret

Installing missingno

The Nullity Matrix

领英推荐

The Nullity Dendrogram

The Nullity Correlation Heatmap

Conclusion

Bj?rn Thomsen的更多文章

Visualizing Marketing Geo Data with PyGWalker: As Simple as Mapping Meteorite Strikes

Unveiling Hidden Patterns in Performance Marketing with HoloViews' Violin Plots and Radial Heatmaps

Strategic Website Planning: Using Python and NetworkX to Visualize & Compare Sitemap Trees

From Practical to Playful: How to Animate E-commerce Data & Website Logs with Python and Matplotlib

You're Doing SEO Well! But Did You Know That You Can Conduct Keyword Research Using Python Pytrends & Data Visualization?

3 Implementation Ideas: Enhancing Google Ads with PandasAI, OpenWeather, and IDEFICS

Restricting Tag Deployment in Google Tag Manager

Enhancing Marketing Attribution Through Predictive Visualization With Markov Chain

Visualizing Marketing Performance Data With the Tableau-like PyGWalker Library

Don't Ghost it: Implementing the Consent Mode with Google Tag Manager (GTM)

社区洞察

其他会员也浏览了

Python eats away at R: Top Software for Analytics, Data Science, Machine Learning in 2018 -Trends and Analysis

Matplotlib

Analyzing Excel Sales Data with Python Pandas and Seaborn - Part II

Guide to Churn Prediction : Part 5— Graphical analysis

Free STATA 18 with a valid license key through a completely ethical approach - no cracks or portable versions

SQL Challenge: Number Of Custom Email Labels

Essential Tools for Aspiring Data Scientists: Your Path to Success

Integrating PyCaret time-series module into Power BI

Z-Order: Visualization and Implementation

Creating a Machine Learning App with Power BI and PyCaret