登录查看更多内容

Visualizing Nullity: How to Uncover Gaps in Your Marketing Data With Python

Bj?rn Thomsen

Marketing Lead at meshcloud.io | Driving B2B Market Growth for Platform Engineering Company | Performance Marketing, Marketing Automation, Data Analytics, Marketing Strategy

发布日期: 2024年2月18日

Nullity is a significant issue in the analysis of marketing data, particularly when integrating performance metrics from various marketing channels and ad networks. This often results in gaps or missing values, which can distort clean attribution of the contribution of a marketing channel or ad network, or even the entire statistical analysis.

Dealing with these gaps in the context of data wrangling involves various methods as per statistical theory. Common approaches include dropping rows with missing metrics, setting them to "null," or imputing them with a measure such as the mean, median, or arithmetic/logarithmic mean, depending on the context. This topic delves into its own field of study, which we won't explore further in this article.

Our focus in this article is solely on visualizing nullity to easily identify the extent of missing data and any resulting interdependencies. For this purpose, we utilize the Python library "missingno" available at: https://github.com/ResidentMario/missingno. This tool allows marketers, even those without in-depth knowledge of data analysis, to gain a visual understanding of the nullity within their data.

By the way: Not only 0, null, "", and NaN pose problems in data interpretation. Outliers such as min and max values often distort the overall picture as well. A single massively underperforming ad with an extremely high CPA can skew the entire performance review. Therefore, always be on the lookout for outliers and gaps.

Installing missingno

The first step is to locally open a new Jupyter Notebook. You can find how to install and run a Jupyter Notebook here: https://jupyter.org/install

We also need the missingno library for Python. We can install it using the following command line in our console:

pip install missingno

Now we open a new Jupyter Notebook and import our marketing performance data, for which I will use dummy values in CSV format for Google Ads, TikTok Ads, and LinkedIn Ads, as a DataFrame. You can replace the file path with your own CSV file. By the way, missingno makes use of Matplotlib, as we can see in the code.

import pandas as pd 
import missingno as msno 
df = pd.read_csv(r"C:\Users\Bj?rnThomsen\Desktop\your-csv-file.csv", delimiter=',', skiprows=0, low_memory=False) 
%matplotlib inline

The Nullity Matrix

Using the matrix() method, we produce our initial visualization, portraying the magnitude of nullity within the dataset. White lacunae signify absent dimensions. The sparkline on the right encapsulates the overall profile of data completeness, identifying rows with the maximum and minimum nullity in the dataset.

msno.matrix(df.sample(35))

White lacunae signify absent dimensions in the missingno nullity matrix.

领英推荐

Analyzing Excel Sales Data with Python Pandas and…

Eduardo Miranda 8 个月前

The Evolution of My Data Analytics Journey with Python

Gerard V. Edom 6 个月前

Python eats away at R: Top Software for Analytics…

Gregory Piatetsky-Shapiro 6 年前

The Nullity Dendrogram

Next, we generate a dendrogram with the dendrogram() method to illustrate the hierarchical clustering of nullity correlations between values. While this may sound complex, it simply means that we represent pairs of metrics or clusters in a hierarchical tree, whose values or nullity are particularly strongly correlated.

Due to the lack of performance data in our Linkedin Remarketing Campaigns in Asia, as well as missing region codes for our Google Ads Campaigns, comparability is obviously compromised.

msno.dendrogram(df)

The dendrogram method in missingno creates a correlation tree for nullity.

The Nullity Correlation Heatmap

Lastly, I would like to create a nullity correlation heatmap using the heatmap() method. This is not a conventional heatmap that simply calculates the Pearson product-moment correlation coefficient of the values contained within. Instead, it focuses solely on the correlation of presence or absence of nullity.

This operates as follows: Nullity correlation ranges from -1 (indicating that if one variable appears, the other definitely does not) to 0 (suggesting that the appearance or absence of variables has no effect on each other) to 1 (indicating that if one variable appears, the other definitely does as well).

msno.heatmap(df)

Not the typical correlation heatmap you know: This one focuses solely on the correlation of nullity.

Conclusion

From a personal perspective, entries with missing data should be regarded with significant skepticism in the overall analysis. Addressing nullity through various statistical imputation techniques is essential for maintaining data integrity. For an initial visual assessment of the extent of nullity, especially with limited data analysis resources at hand, missingno in Python stands out as the preferred choice for an initial assessment

Merged marketing data in particular, such as performance values from various marketing channels and ad networks, tend to have gaps. This is due to the nature of the matter: Video performance produces different metrics than email marketing or SEO.

By the way, rarely have I found an article to come together so effortlessly. With missingno, visualizing nullity is easy as pie. ??

要查看或添加评论，请登录

Bj?rn Thomsen的更多文章

Hands-On Example: Google Trends Visualization with Pytrends API – Keyword Volumes, Regional Insights, and Correlations

2025年1月25日

Hands-On Example: Google Trends Visualization with Pytrends API – Keyword Volumes, Regional Insights, and Correlations

When it comes to keyword research, spotting trends, identifying anomalies, or uncovering correlations between search…
Evaluating the Practical Use of No-Code AI App Builders—Like Lovable and Bolt—in Marketing

2024年12月18日

Evaluating the Practical Use of No-Code AI App Builders—Like Lovable and Bolt—in Marketing

Marketing teams, especially in SMBs, often face a dual responsibility: not only are they tasked with creating and…
Outlier Detection and Removal in Performance Marketing Data with Scikit-Learn’s IsolationForest

2024年12月1日

Outlier Detection and Removal in Performance Marketing Data with Scikit-Learn’s IsolationForest

In online marketing, data is everything. Whether you’re evaluating your TikTok ads performance, segmenting audiences…
Visualizing Marketing Geo Data with PyGWalker: As Simple as Mapping Meteorite Strikes

2024年11月16日

Visualizing Marketing Geo Data with PyGWalker: As Simple as Mapping Meteorite Strikes

Mapping geospatial data, such as website traffic or sales by location, doesn’t necessarily require expensive tools like…
Unveiling Hidden Patterns in Performance Marketing with HoloViews' Violin Plots and Radial Heatmaps

2024年9月27日

Unveiling Hidden Patterns in Performance Marketing with HoloViews' Violin Plots and Radial Heatmaps

When marketers and web analysts want to represent complex distributions and relationships between values, they…
Strategic Website Planning: Using Python and NetworkX to Visualize & Compare Sitemap Trees

2024年8月23日

Strategic Website Planning: Using Python and NetworkX to Visualize & Compare Sitemap Trees

Website maintenance, frequent updates, or even a partial relaunch can sometimes lead to a drift from the original…

2 条评论
From Practical to Playful: How to Animate E-commerce Data & Website Logs with Python and Matplotlib

2024年6月27日

From Practical to Playful: How to Animate E-commerce Data & Website Logs with Python and Matplotlib

Sometimes, it's not enough to simply track and tabulate Ecommerce or website KPIs. Marketers often face the challenge…
You're Doing SEO Well! But Did You Know That You Can Conduct Keyword Research Using Python Pytrends & Data Visualization?

2024年5月8日

You're Doing SEO Well! But Did You Know That You Can Conduct Keyword Research Using Python Pytrends & Data Visualization?

New marketing campaigns, which require tailored content on the website, often necessitate intensive SEO research. Tools…

1 条评论
3 Implementation Ideas: Enhancing Google Ads with PandasAI, OpenWeather, and IDEFICS

2024年4月26日

3 Implementation Ideas: Enhancing Google Ads with PandasAI, OpenWeather, and IDEFICS

Paid campaigns are becoming increasingly expensive. Especially in Google Search and Google Display Network, click…
Restricting Tag Deployment in Google Tag Manager

2024年3月29日

Restricting Tag Deployment in Google Tag Manager

Managing Tag Deployment with Google Tag Manager There are many reasons to exercise caution when deploying tags…

2 条评论

See all articles

Visualizing Nullity: How to Uncover Gaps in Your Marketing Data With Python

Bj?rn Thomsen

Marketing Lead at meshcloud.io | Driving B2B Market Growth for Platform Engineering Company | Performance Marketing, Marketing Automation, Data Analytics, Marketing Strategy

Installing missingno

The Nullity Matrix

领英推荐

The Nullity Dendrogram

The Nullity Correlation Heatmap

Conclusion

Bj?rn Thomsen的更多文章

社区洞察

其他会员也浏览了

Matplotlib

Analyzing Excel Sales Data with Python Pandas and Seaborn - Part II

Free STATA 18 with a valid license key through a completely ethical approach - no cracks or portable versions

SQL Challenge: Number Of Custom Email Labels

Accessing Data with loc: Label-Based Indexing in Pandas

Integrating PyCaret time-series module into Power BI

Essential Tools for Aspiring Data Scientists: Your Path to Success

Z-Order: Visualization and Implementation

Streamlit for Data Science

Data insights

Installing missingno

The Nullity Matrix

领英推荐

The Nullity Dendrogram

The Nullity Correlation Heatmap

Conclusion

Bj?rn Thomsen的更多文章

Hands-On Example: Google Trends Visualization with Pytrends API – Keyword Volumes, Regional Insights, and Correlations

Evaluating the Practical Use of No-Code AI App Builders—Like Lovable and Bolt—in Marketing

Outlier Detection and Removal in Performance Marketing Data with Scikit-Learn’s IsolationForest

Visualizing Marketing Geo Data with PyGWalker: As Simple as Mapping Meteorite Strikes

Unveiling Hidden Patterns in Performance Marketing with HoloViews' Violin Plots and Radial Heatmaps

Strategic Website Planning: Using Python and NetworkX to Visualize & Compare Sitemap Trees

From Practical to Playful: How to Animate E-commerce Data & Website Logs with Python and Matplotlib

You're Doing SEO Well! But Did You Know That You Can Conduct Keyword Research Using Python Pytrends & Data Visualization?

3 Implementation Ideas: Enhancing Google Ads with PandasAI, OpenWeather, and IDEFICS

Restricting Tag Deployment in Google Tag Manager

社区洞察

其他会员也浏览了

Matplotlib

Analyzing Excel Sales Data with Python Pandas and Seaborn - Part II

Free STATA 18 with a valid license key through a completely ethical approach - no cracks or portable versions

SQL Challenge: Number Of Custom Email Labels

Accessing Data with loc: Label-Based Indexing in Pandas

Integrating PyCaret time-series module into Power BI

Essential Tools for Aspiring Data Scientists: Your Path to Success

Z-Order: Visualization and Implementation

Streamlit for Data Science

Data insights