How graph theory helped us discover a 500 year old English king (Networks 5)
Keith McNulty
Leader in Technology, Science and Analytics | Mathematician, Statistician and Psychometrician | Author and Teacher | Coder, Engineer, Architect
One morning in August 2012 in a car park in Leicester in the English midlands, a mechanical digger was starting to cut into the concrete surface. A number of interested spectators were present, hoping against hope that something amazing but highly improbable might occur.
Some rugged detective work had led a group of amateur and professional historians to believe that this may be the area where Richard III - the stooped, crooked-backed English king who met a brutal end at the Battle of Bosworth Field in 1485 - had been unceremoniously dumped over 500 years ago. Bolstered by donations and crowdfunding, they raised enough money and gained approval for a limited archaeological dig in the car park of the local council building. They were excited beyond measure. Everyone else, including the academic and professional archaeologists present, were sceptical.
The first cut was made in an area of the car park marked with a mysterious-looking letter R. Presumably this indicated that the parking space was reserved for someone important, but in truth, nobody could explain why that R was there and what it meant.
Within hours, immediately below that letter R, they found a skeleton with a crooked back.
It was to be the beginning of one of the most amazing archaeological discoveries of modern times. But how could they prove that this skeleton, which clearly showed a lifetime suffering from the debilitating spinal condition scoliosis – something long associated with historical descriptions of King Richard III – was without a doubt King Richard III?
Subsequent analysis of the skeleton showed it to date from the late 1400s and to have enjoyed a rich diet of meat and fish, all of which increased the chance that this was the King. But they needed one more thing to complete the proof: DNA.
To use DNA to prove that this was Richard III was going to be a major challenge. Only mitochondrial DNA or mtDNA, which is passed down through maternal lines in families, remains unchanged from generation to generation. Therefore in order to find proof, a living descendent of the sister of Richard III who could only be traced through a 500 year old line of females needed to be found. Talk about a needle in a haystack!
That needle turned out to be Michael Ibsen, a Canadian who was working as a carpenter and furniture restorer in North London. A swab was taken from Michael's mouth and the mtDNA was extracted and compared to that extracted from the skeleton. It was a perfect match.
On March 26th 2015 the exhumed body of King Richard III was taken to Leicester Cathedral where where it was re-interred with full Church of England rites and with great ceremony...over 500 years after he died!
Graph databases
Michael Ibsen's royal ancestry, and countless other mysteries of family lineage, are now greatly more discoverable because of the emergence of graph databases.
Although they have existed in highly specialized circles since the 1960s, graph databases started entering the enterprise technology space around 2005. Unlike more common relational databases, which store information in linked tables, the principles behind graph databases are exactly the same as in mathematical graph theory (see previous articles in this series for a briefing on graph theory). In fields related to the study of people, graph databases can be really useful.
Each ‘vertex’ or 'node' can store information about a person. Each ‘edge’ stores information about the relationships between the two people it connects. This allows much more flexible query. By querying nodes and edges you can answer questions like ‘show me the all people who have published papers with this person’. On the Mathematics Genealogy Project website, which queries against a graph database, I can trace my mathematics lineage to such giants as Dirichlet, Poisson and Euler using edges. I can also find out about mine or their PhD theses titles by querying the nodes.
Genealogy is a great example of the use of a graph database. Ancestry.com stores biographical information about an individual in its nodes (birth date, marriage date, photos, documents) and it stores relationship data in its edges (mother, father, brother, sister). When you view the family tree of someone, you are querying the graph database of that individual (albeit a particular type of graph called a tree - see here for a briefing on graph types).
Building a graph database
Graph databases are relatively straightforward to set up and configure, and there are some pretty established graph database products on the market right now. Neo4j is the most popular graph database currently, and works for most of the use cases that would be needed in the people analytics space, but there are plenty of others too. For any use case where relationships are the major focus of the analytics, a graph database will prove superior to traditional relational databases. It will process queries quicker and will be more easy to configure and adjust. You can ask more complex questions with more efficient querying.
It makes particular sense to set up a graph database engine for people analytics when you have existing data which can be reconfigured to represent connections between people. Examples of this may include:
- Hours worked together on projects from timesheet or finance data
- Email or calendar connections from email metadata
- Joint participation in events from logs or attendance records
- Document collaborations from publishing records
- Declared connections such as mentorship or coaching relationships
If you haven't played with graph databases, try to find an opportunity to test them out. Hands on experience will give you a real sense of their power. If graph databases can find all your 5th cousins in a few seconds, imagine what they could do in your organization.
In the next and final post of this series on network analytics, I will look at how networks, and particularly social networks, offer incredible opportunities for the improvement of humankind through enabling collective intelligence, but how this is also a double edged sword..
I lead McKinsey's internal People Analytics and Measurement function. Originally I was a Pure Mathematician, then I became a Psychometrician. I am passionate about applying the rigor of both those disciplines to complex people questions. I'm also a coding geek and a massive fan of Japanese RPGs.
All opinions expressed are my own and not to be associated with my employer or any other organization I am connected with.
Supporting SaMD Professionals & Companies via Consulting and the MDSW Network | Click below ?? ↓
8 个月Hi Keith, was opening up an old text book on graph theory to get deeper into neo4j and came across this post while googling... cool post. Thanks
CPCM, CSDP, CPP, CCAS, Government Contracting Assistance
6 年I am amazed. Thank you for posting this.