Data Wrangling: Generating a U.S. Presidential Election Knowledge Graph from a Google Spreadsheet

Data Wrangling: Generating a U.S. Presidential Election Knowledge Graph from a Google Spreadsheet

Arising from curiosity about the potential outcome of the 2020 US Presidential Election, I’ve constructed a spreadsheet comprising data about eligible voters in said election, a breakdown of early-votes cast by state, and the final tallies for the 2016 election.

I am interested in states that show very high early-vote counts as a percentage of the total vote count in the prior 2016 Presidential Election.

Data Wrangling Goal?

I want to generate an explorable Knowledge Graph from my Google Spreadsheet which itself comprises a merge of data from various sources that include:

I will use the Virtuoso instance behind the public URIBurner Service (which includes an enabled Sponger Transformation Middleware Module), as my tool of choice. 

Steps?

[1] Create Google Spreadsheet. Here’s my example. 

No alt text provided for this image

[2] Pass the Spreadsheet URL to URIBurner using options for returning CSV rather than HTML — https://linkeddata.uriburner.com/about/html/https/docs.google.com/spreadsheets/d/1PTqUkqv-9BPWY1V1cFq13xR92EL6vJ9b8N4Ox3TUagE/gviz/tq?tqx=out:csv&range=A2:O53&sheet=2020_Election_Analysis

That returns an HTML-Entity Description Page which is also an entry point into a newly generated Knowledge Graph. This pages includes a listing of Entities derived from Rows in the original Spreadsheet. 

No alt text provided for this image

For instance, clicking on the hyperlink “Record2” results in a lookup that returns data for the state of Texas.

No alt text provided for this image

[3] Faceted Browsing View — https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Flinkeddata.uriburner.com%2Fabout%2Fid%2Fentity%2Fhttps%2Fdocs.google.com%2Fspreadsheets%2Fd%2F1PTqUkqv-9BPWY1V1cFq13xR92EL6vJ9b8N4Ox3TUagE%2Fgviz%2Ftq%3Ftqx%3Dout%3Acsv%26range%3DA2%3AO53%26sheet%3D2020_Election_Analysis&distinct=1

Alternatively, you can click on the “Browse using” drop-down or manually construct an alternative URL-pattern to obtain a different HTML-based Entity Description page that includes powerful Faceted Browsing capability i.e., the use of Entity Attributes to provide pivot-style exploration across various dimensions.

For instance, here is a list of instances of the Class "2020 Election Analysis" derived from the Google Spreadsheet.

No alt text provided for this image

Here's a view of a specific Record, a specific instance of the "2020 Election Analysis" Class derived from the Google Spreadsheet.

No alt text provided for this image

Here's the Ontology derived from the Google Spreadsheet.

No alt text provided for this image

A Little Reasoning & Inference

At this point, I have two views of my Knowledge Graph, but the labelling for each state could be clearer. 

To solve the preferred labelling problem, I’ve applied a SPARQL INSERT statement to generate an additional attribute that will ultimately be used as my preferred label, courtesy of the skos:prefLabel term from the SKOS Ontology.

PREFIX lod:                 <https://lod.openlinksw.com/>
PREFIX rdfs:                <https://www.w3.org/2000/01/rdf-schema#>
PREFIX schema:              <https://schema.org/>
PREFIX owl:                 <https://www.w3.org/2002/07/owl#>
PREFIX skos:                <https://www.w3.org/2004/02/skos/core#>
PREFIX election-2020-data:  <https://docs.google.com/spreadsheets/d/1PTqUkqv-9BPWY1V1cFq13xR92EL6vJ9b8N4Ox3TUagE/gviz/tq?tqx=out:csv&range=A2:O53&sheet=2020_Election_Analysis#>
INSERT {
         GRAPH <urn:us:election:2020:data:cleanup>
           { ?record skos:prefLabel ?label }
       }
WHERE { ?record election-2020-data:State_Name ?label }

The following screenshots depict effects of the INSERT statement above i.e., how the underlying Faceted Browsing Engine has selected the preferred Entity Attribute for Display Labeling used in the “About:” heading. 

No alt text provided for this image

And here's a revamped view of "Texas" rather than "Record2" .

No alt text provided for this image

Conclusion

Courtesy of the data management and transformation power of Virtuoso, I’ve successfully used a single mouse-click to generate a Knowledge Graph deployed using Linked Data principles from my Google Spreadsheet — comprising data about the U.S. Presidential Election collated from a variety of sources.

Related

Jeff Jockisch

Partner @ ObscureIQ??Data Broker Expert??Privacy Recovery for VIPs

4 年

Thanks for this!

回复
Kingsley Uyi Idehen

Founder & CEO at OpenLink Software | Driving GenAI-Based AI Agents | Harmonizing Disparate Data Spaces (Databases, Knowledge Bases/Graphs, and File System Documents)

4 年

I've also added a new column to the spreadsheet for integrating interactive elections results data from The New York Times.

回复
Aleksandr Blekh, Ph.D.

Software Engineering | Cloud | ML/AI | Solution Architecture | IT Strategy

4 年

Kingsley Uyi Idehen You forgot to include referenced screenshots ("The following screenshots depict effects ..."). But, more importantly ... I couldn't figure out how to access the autogenerated ontology. I have found a way to get to the corresponding ontology page [https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2F1PTqUkqv-9BPWY1V1cFq13xR92EL6vJ9b8N4Ox3TUagE%2Fgviz%2Ftq%3Ftqx%3Dout%3Acsv%26range%3DA2%3AO53%26sheet%3D2020_Election_Analysis%23ontology] (though it was a weird path - e.g.: container of -> (select state, e.g., Texas) -> (select any attribute, e.g., 2020 Early Vote Count) -> isDefinedBy -> 2020 Election Analysis Ontology - I think that such an important entity as ontology should be referenced at the main page of a resource), but I don't see a way to export a relevant Turtle (.ttl) file. Neither do I see a relevant SPARQL endpoint. Would you like to clarify?

要查看或添加评论,请登录

Kingsley Uyi Idehen的更多文章

社区洞察

其他会员也浏览了