登录查看更多内容

Data Wrangling: Generating a U.S. Presidential Election Knowledge Graph from a Google Spreadsheet

Kingsley Uyi Idehen

Founder & CEO at OpenLink Software | Driving GenAI-Based AI Agents | Harmonizing Disparate Data Spaces (Databases, Knowledge Bases/Graphs, and File System Documents)

发布日期: 2020年11月3日

Arising from curiosity about the potential outcome of the 2020 US Presidential Election, I’ve constructed a spreadsheet comprising data about eligible voters in said election, a breakdown of early-votes cast by state, and the final tallies for the 2016 election.

I am interested in states that show very high early-vote counts as a percentage of the total vote count in the prior 2016 Presidential Election.

Data Wrangling Goal?

I want to generate an explorable Knowledge Graph from my Google Spreadsheet which itself comprises a merge of data from various sources that include:

DBpedia
2016 Election Dataset published to Github by Kris Shaffer
2020 Early Voting Data from the U.S Elections Project

I will use the Virtuoso instance behind the public URIBurner Service (which includes an enabled Sponger Transformation Middleware Module), as my tool of choice.

Steps?

[1] Create Google Spreadsheet. Here’s my example.

[2] Pass the Spreadsheet URL to URIBurner using options for returning CSV rather than HTML — https://linkeddata.uriburner.com/about/html/https/docs.google.com/spreadsheets/d/1PTqUkqv-9BPWY1V1cFq13xR92EL6vJ9b8N4Ox3TUagE/gviz/tq?tqx=out:csv&range=A2:O53&sheet=2020_Election_Analysis

That returns an HTML-Entity Description Page which is also an entry point into a newly generated Knowledge Graph. This pages includes a listing of Entities derived from Rows in the original Spreadsheet.

For instance, clicking on the hyperlink “Record2” results in a lookup that returns data for the state of Texas.

[3] Faceted Browsing View — https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Flinkeddata.uriburner.com%2Fabout%2Fid%2Fentity%2Fhttps%2Fdocs.google.com%2Fspreadsheets%2Fd%2F1PTqUkqv-9BPWY1V1cFq13xR92EL6vJ9b8N4Ox3TUagE%2Fgviz%2Ftq%3Ftqx%3Dout%3Acsv%26range%3DA2%3AO53%26sheet%3D2020_Election_Analysis&distinct=1

Alternatively, you can click on the “Browse using” drop-down or manually construct an alternative URL-pattern to obtain a different HTML-based Entity Description page that includes powerful Faceted Browsing capability i.e., the use of Entity Attributes to provide pivot-style exploration across various dimensions.

For instance, here is a list of instances of the Class "2020 Election Analysis" derived from the Google Spreadsheet.

Here's a view of a specific Record, a specific instance of the "2020 Election Analysis" Class derived from the Google Spreadsheet.

Here's the Ontology derived from the Google Spreadsheet.

A Little Reasoning & Inference

At this point, I have two views of my Knowledge Graph, but the labelling for each state could be clearer.

To solve the preferred labelling problem, I’ve applied a SPARQL INSERT statement to generate an additional attribute that will ultimately be used as my preferred label, courtesy of the skos:prefLabel term from the SKOS Ontology.

PREFIX lod:                 <https://lod.openlinksw.com/>
PREFIX rdfs:                <https://www.w3.org/2000/01/rdf-schema#>
PREFIX schema:              <https://schema.org/>
PREFIX owl:                 <https://www.w3.org/2002/07/owl#>
PREFIX skos:                <https://www.w3.org/2004/02/skos/core#>
PREFIX election-2020-data:  <https://docs.google.com/spreadsheets/d/1PTqUkqv-9BPWY1V1cFq13xR92EL6vJ9b8N4Ox3TUagE/gviz/tq?tqx=out:csv&range=A2:O53&sheet=2020_Election_Analysis#>
INSERT {
         GRAPH <urn:us:election:2020:data:cleanup>
           { ?record skos:prefLabel ?label }
       }
WHERE { ?record election-2020-data:State_Name ?label }

The following screenshots depict effects of the INSERT statement above i.e., how the underlying Faceted Browsing Engine has selected the preferred Entity Attribute for Display Labeling used in the “About:” heading.

And here's a revamped view of "Texas" rather than "Record2" .

Conclusion

Courtesy of the data management and transformation power of Virtuoso, I’ve successfully used a single mouse-click to generate a Knowledge Graph deployed using Linked Data principles from my Google Spreadsheet — comprising data about the U.S. Presidential Election collated from a variety of sources.

Data Wrangling Definition from the OpenLink Technology Glossary
US 2020 Presidential Elections Google Spreadsheet
Ontology Generated from the Google Spreadsheet -- RDF Turtle Format
Ontology Generated from the Google Spreadsheet -- JSON-LD Format
What is the Virtuoso Sponger, and Why is it Important?
Virtuoso Home Page

Jeff Jockisch

Partner @ ObscureIQ??Data Broker Expert??Privacy Recovery for VIPs

4 年

Thanks for this!

Kingsley Uyi Idehen

Founder & CEO at OpenLink Software | Driving GenAI-Based AI Agents | Harmonizing Disparate Data Spaces (Databases, Knowledge Bases/Graphs, and File System Documents)

4 年

I've also added a new column to the spreadsheet for integrating interactive elections results data from The New York Times.

Aleksandr Blekh, Ph.D.

Software Engineering | Cloud | ML/AI | Solution Architecture | IT Strategy

4 年

Kingsley Uyi Idehen You forgot to include referenced screenshots ("The following screenshots depict effects ..."). But, more importantly ... I couldn't figure out how to access the autogenerated ontology. I have found a way to get to the corresponding ontology page [https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2F1PTqUkqv-9BPWY1V1cFq13xR92EL6vJ9b8N4Ox3TUagE%2Fgviz%2Ftq%3Ftqx%3Dout%3Acsv%26range%3DA2%3AO53%26sheet%3D2020_Election_Analysis%23ontology] (though it was a weird path - e.g.: container of -> (select state, e.g., Texas) -> (select any attribute, e.g., 2020 Early Vote Count) -> isDefinedBy -> 2020 Election Analysis Ontology - I think that such an important entity as ontology should be referenced at the main page of a resource), but I don't see a way to export a relevant Turtle (.ttl) file. Neither do I see a relevant SPARQL endpoint. Would you like to clarify?

1 次回应

查看更多评论

要查看或添加评论，请登录

Kingsley Uyi Idehen的更多文章

Executing SQL Stored Procedures via HTTP: A Data Access Game-Changer in the Age of AI

2025年3月8日

Executing SQL Stored Procedures via HTTP: A Data Access Game-Changer in the Age of AI

In the evolving landscape of data management and AI-driven applications, executing SQL stored procedures via HTTP…

1 条评论
A Way Forward for Better Social Media

2025年2月8日

A Way Forward for Better Social Media

For years, creators have lived with the norm of renting space on big social media properties so they can amass a…
GenAI Inference Experiment comparing services from Groq, DeepSeek, Alibaba, OpenAI, Microsoft, Google, and Cerebras

2025年1月29日

GenAI Inference Experiment comparing services from Groq, DeepSeek, Alibaba, OpenAI, Microsoft, Google, and Cerebras

In a recent exploration of AI reasoning & inference, as facilitated via Large Languge Models (LLMs), I conducted an…

5 条评论
Addressing DBMS Innovation Stagnation with Hyperlinks as Super Keys

2025年1月26日

Addressing DBMS Innovation Stagnation with Hyperlinks as Super Keys

Strategic Memo To: Chief Data Officers (CDOs) Chief Artificial Intelligence Officers (CAIOs) Chief Information Officers…

5 条评论
The Semantic Web Project Revitalized: From Vision to Reality with Reasoning and Inference

2025年1月25日

The Semantic Web Project Revitalized: From Vision to Reality with Reasoning and Inference

The journey of the Semantic Web Project—a vision of a web of data enriched with meaning—has been marked by challenges…

1 条评论
2025 Technology Predictions: A Shift Towards Agentic Workflows and Decentralized Innovation

2025年1月2日

2025 Technology Predictions: A Shift Towards Agentic Workflows and Decentralized Innovation

As we step into 2025, the technology landscape continues to evolve at an unprecedented pace. Emerging trends are…

9 条评论
The Symbolism of Words and Sentences: How Language Shapes Our Worldview and Powers the Digital Realm

2024年12月20日

The Symbolism of Words and Sentences: How Language Shapes Our Worldview and Powers the Digital Realm

Language is the foundation of human connection—a bridge between thought and reality. At its core, language is built on…

2 条评论
Notes from BG2 Podcast featuring Satya Nadella

2024年12月17日

Notes from BG2 Podcast featuring Satya Nadella

I stumbled upon an insightful podcast by BG2 (Bill Gurley & Brad Gerstner) featuring Microsoft's Satya Nadella over the…

4 条评论
Showcasing the Power of Web Content Crawling and LLMs: BBC Good Foods Knowledge Graph

2024年12月7日

Showcasing the Power of Web Content Crawling and LLMs: BBC Good Foods Knowledge Graph

Earlier this week, we at OpenLink Software demonstrated how combining web content crawling with large language models…

4 条评论
Enterprise Challenges: Disparate Applications and Data Architectures Across Lines of Business

2024年12月3日

Enterprise Challenges: Disparate Applications and Data Architectures Across Lines of Business

Enterprises often contend with disparate architectures across various lines of business (LOBs), where each LOB adopts…

2 条评论

See all articles

Data Wrangling: Generating a U.S. Presidential Election Knowledge Graph from a Google Spreadsheet

Kingsley Uyi Idehen

Founder & CEO at OpenLink Software | Driving GenAI-Based AI Agents | Harmonizing Disparate Data Spaces (Databases, Knowledge Bases/Graphs, and File System Documents)

Data Wrangling Goal?

Steps?

A Little Reasoning & Inference

Conclusion

Related

Kingsley Uyi Idehen的更多文章

社区洞察

其他会员也浏览了

A Practical Blueprint for Future Predictions with Holistic Computation: Insights from the 2024 Election Prediction Success

What RevOps can learn from election politics

Aggregate statistics can sometimes mask important information.

PiPaper by Political Ai (Pi) for 2024 U.S. Presidential Election: Trump vs. Harris

Can a Business Analyst Propel Kamala Harris to a Historic Win Over Donald Trump?

Ahhhhhh...Election Season – The Most Blessed Time of the Year for Data Visualization

PREDICTING VOTING BEHAVIOR

PREDICTING VOTING BEHAVIOR

Trump’s Manhattan Project of #PersonalData

The 'lookup' Brothers - xlookup, vlookup, hlookup, lookup

Data Wrangling Goal?

Steps?

A Little Reasoning & Inference

Conclusion

Related

Kingsley Uyi Idehen的更多文章

Executing SQL Stored Procedures via HTTP: A Data Access Game-Changer in the Age of AI

A Way Forward for Better Social Media

GenAI Inference Experiment comparing services from Groq, DeepSeek, Alibaba, OpenAI, Microsoft, Google, and Cerebras

Addressing DBMS Innovation Stagnation with Hyperlinks as Super Keys

The Semantic Web Project Revitalized: From Vision to Reality with Reasoning and Inference

2025 Technology Predictions: A Shift Towards Agentic Workflows and Decentralized Innovation

The Symbolism of Words and Sentences: How Language Shapes Our Worldview and Powers the Digital Realm

Notes from BG2 Podcast featuring Satya Nadella

Showcasing the Power of Web Content Crawling and LLMs: BBC Good Foods Knowledge Graph

Enterprise Challenges: Disparate Applications and Data Architectures Across Lines of Business

社区洞察

其他会员也浏览了

A Practical Blueprint for Future Predictions with Holistic Computation: Insights from the 2024 Election Prediction Success

What RevOps can learn from election politics

Aggregate statistics can sometimes mask important information.

PiPaper by Political Ai (Pi) for 2024 U.S. Presidential Election: Trump vs. Harris

Can a Business Analyst Propel Kamala Harris to a Historic Win Over Donald Trump?

Ahhhhhh...Election Season – The Most Blessed Time of the Year for Data Visualization

PREDICTING VOTING BEHAVIOR

PREDICTING VOTING BEHAVIOR

Trump’s Manhattan Project of #PersonalData

The 'lookup' Brothers - xlookup, vlookup, hlookup, lookup