The Graphs of Kevin Bacon

The Graphs of Kevin Bacon

Who's your "Kevin Bacon"?

The celebrated actor gained a rather odd additional notoriety in 1994 when three Albright College students, Craig Fass, Brian Turtle, and Mike Ginelli, were watching the movie "Footloose", followed thereafter  by "The Air Up There", both featuring Kevin Bacon as prominent characters. Doing some research, they made the startling revelation that everyone in the US could be related via association to the actor through a chain of intermediaries by no more than six steps. While this was done originally as something of a party game, it in time laid the foundation for Facebook, Twitter and, of course, Linked-In.

This web of interconnections does not imply that Kevin Bacon is in fact all that extraordinary - the whole world does not in fact revolve around Kevin Bacon, though it may seem to from this. Instead, what it says is that any given person will have friends, acquaintances, co-workers or family that they know fairly well, who in turn all have their own "friends" circle, and so on. Move beyond that first layer of connections and the likelihood that you know someone directly drops by a significant percentage, move beyond the second and it drops more, but there is never a point where in this expanding ring it drops to zero.

This means that as you move outward, the likelihood that you have a continuous chain of connection to someone specific grows, until it actually exceeds 100%.

This network of connections is known as a graph or network.  A graph is a way of showing the entities in a system as well as the relationships that connect them. The simplest graphs, unlabeled graphs usually identify the nodes (or things) that are connected, but assume there's only one relevant relationship (the friend of a friend (or FOAF) graph is mostly like this). In more complex graphs, the relationships are both labeled and directional, where each "edge" (line between nodes) has a distinct direction and also has a label. 

 Graphs are also characterized by whether or not they are cyclic or acyclic. In a cyclic graph, you may have a relationship such as "A knows B who knows C who knows D who knows A". Most social graphs are at least moderately cyclic, though introducing a direction to each edge can reduce that somewhat. This can add to the complexity of navigating through such graphs, as you need to track at any point in a path whether you have encountered a given person more than once or risk getting caught in a loop.

What made the World Wide Web so powerful was Tim Berners-Lee's insistence that links were important.

Most data structures tend to fall into the category of either labeled directed cyclic  graphs (LDCGs) or labeled directed acyclic graphs  (LDAGs). Relational databases can fall into the former (which is one reason why most SQL databases in particular check assiduously to prevent such loops from forming), while XML and JSON documents (both hierarchies) are usually described by the latter, although XML has a limited facility for creating referential links.

For processing purposes, most hierarchies are considered LDAGs - it is possible by following a simple algorithm - if you can descend then do so, otherwise go right - to traverse the graph without ever encountering the same node. This property actually makes such hierarchies very useful for marking up narrative content, as traversing a hierarchy in this matter will "unspool" the tree into a cohesive narrative structure. 

Labeled directed cyclic graphs by themselves are comparatively rare, showing up primarily in things like token ring network topologies. However, general labeled directed graphs - (LDGs) which mix both cyclic and acyclic graphs, are very common, and can effectively describe (almost) all data structures. The semantic web is an LDG, as for that matter, is the Internet itself.  LDGs are a super-set of LDAGs - you can create a hierarchy using RDF assertions just fine, but XML structures generally have trouble representing LDGs unless you add in some kind of linking convention, and even then it's usually very awkward.

The web is a labeled directed graph, in which each web address is a node (representing content) and each hyperlink is an edge directed to another node address. What made the World Wide Web so powerful was Tim Berners-Lee's insistence that links were important. Ironically, most people thereafter, including many people who were otherwise fundamental in building HTML, lost sight of this fact, and concentrated much more on the "display" part of the browser. 

Part of the problem with links is that you do not know what is on the other side of a link until you traverse it. Not surprisingly, this experience mirrors the nature of both the web and knowledge systems in general, and also provides another way of thinking about the open world assumption (OWA). You do not in general know what you do not know. Anyone who has spent time on Wikipedia (or other wiki-like sites such as TVTropes) knows full well that some time spent originally looking for the population of China will ultimately end up, several hours later, learning about the Wave-Particle duality or the differences between ancient Brythonic tongues. 

Indeed, this novelty is a powerful facet of semantic systems. In a rich semantic system, there are multiple potential types of relationships, each of which allows for exploration of an information space in different ways.  You navigate across this space on a sea of related links. In a medical knowledge system, for instance, you may move through anatomical systems of the hand, then take a brief dive into pathology before focusing on injuries, which then focus on treatments and pharmacology. In a health insurance realm, you might go from people to claims to doctors to hospitals, from their delving into legal requirements and billing practices. In entertainment, you can skirt from actors to films to roles and tropes, from there to script-writing or thematic elements. 

In other words, graph-based networks allows for exploration in ways that most other data systems don't. One way of thinking about this is that search and semantics (graphs) do almost diametrically opposite things. Search works by identifying a particular piece of information (usually textually) from within a corpus of content. Search doesn't necessarily provide much context - you can make some assumptions about concepts such as relevance based upon probabilistic characteristics of words within documents, but you also end up with a lot of false positives.

Smart data knows both about itself and about its place in the world. 

Semantics is more like following a trial of citations - where one book will point to a passage in a second book (with enough metadata to provide context for the link), which in turn will have more content and more citations that you can follow. While you may eventually drift away from your original goal, you can always backtrack to a point where the content still held relevancy and try a different direction. 

As powerful as each is separately, the two together are almost unbeatable. In a hybrid search-semantic system, you can use search to establish potential starting points, then use semantics to navigate outward from there to get a sense of the overall context of the information (and its richness or relevance). You can bookmark critical points in your explorations and even pass these paths to others as a collection of assertions. 

It is for this reason, among others, that graphs are increasingly becoming a fixture of big data systems.  Smart data (whether "big" or "small") has context, information both about itself and about its place in the world. It provides multiple avenues for developing insight about the data, allows for different categorization schemes to be attached to the same entities to provide alternative (and in many cases complementary) ways of understanding this information, and is self-describing to the extent of being "discoverable" without necessarily being tied to a specific platform (graphs are remarkably portable).

For more information on graphs and semantics, I'd recommend checking out Semantic Web for the Working Ontologist, by James Hendler and Dean Allemang. It's approachable and still informative, and was one of my own bibles in getting involved in the Semantics space.

Kurt Cagle is a working ontologist. 

Thomas Bradford

Transformational CTO

9 年

Kurt, you are incorrect. The world, in fact, *does* revolve around Kevin Bacon.

回复

incidentally, you, Kurt Cagle, and I are a 3rd degree link in LinkedIn. I know someone who knows someone who knows you :-)

Rick Marshall BSc BE

Unibase database, language, AI and semantic data models. Data modeller, cyber security, custom applications

9 年

I used this principle when I designed Unibase 30 years ago. It's amazing the data problems that a semantically aware database can solve. And how fast that knowledge can make it.

Michael Malgeri

Owner, kids4biz.com

9 年

I beg to differ Kurt. If you saw Apollo 13, you know the world does indeed revolve around Kevin Bacon :) I like your observation that Smart data has information both about itself and about its place in the world. We see this in major film studio efforts to semantically tag their content while continually measuring the pulse of the consumer in response to that content. The combination helps determine what characters, actors, story lines, themes, etc. resonate with consumers, which in turn drives new projects and other opportunities among other things. Nice article!

要查看或添加评论,请登录

Kurt Cagle的更多文章

  • Reality Check

    Reality Check

    Copyright 2025 Kurt Cagle / The Cagle Report What are we seeing here? Let me see if I can break it down: ?? Cloud…

    14 条评论
  • MarkLogic Gets a Serious Upgrade

    MarkLogic Gets a Serious Upgrade

    Copyright 2025 Kurt Cagle / The Cagle Report Progress Software has just dropped the first v12 Early Access release of…

    14 条评论
  • Beyond Copyright

    Beyond Copyright

    Copyright 2025 Kurt Cagle / The Cagle Report The question of copyright is now very much on people's minds. I do not…

    5 条评论
  • Beware Those Seeking Efficiency

    Beware Those Seeking Efficiency

    Copyright 2025 Kurt Cagle / The Cagle Report As I write this, the Tech Bros are currently doing a hostile takeover of…

    86 条评论
  • A Decentralized AI/KG Web

    A Decentralized AI/KG Web

    Copyright 2025 Kurt Cagle / The Cagle Report An Interesting Week This has been an interesting week. On Sunday, a…

    48 条评论
  • Thoughts on DeepSeek, OpenAI, and the Red Pill/Blue Pill Dilemma of Stargate

    Thoughts on DeepSeek, OpenAI, and the Red Pill/Blue Pill Dilemma of Stargate

    I am currently working on Deepseek (https://chat.deepseek.

    41 条评论
  • The (Fake) Testerone Crisis

    The (Fake) Testerone Crisis

    Copyright 2025 Kurt Cagle/The Cagle Report "Testosterone! What the world needs now is TESTOSTERONE!!!" - Mark…

    22 条评论
  • Why AI Agents Aren't Agents

    Why AI Agents Aren't Agents

    Copyright 2025 Kurt Cagle/The Cagle Report One of the big stories in 2024 was that "2025 Would Be The Year of Agentic…

    22 条评论
  • What to Study in 2025 If You Want A Job in 2030

    What to Study in 2025 If You Want A Job in 2030

    Copyright 2025 Kurt Cagle/The Cagle Report This post started out as a response to someone asking me what I thought…

    28 条评论
  • Ontologies and Knowledge Graphs

    Ontologies and Knowledge Graphs

    Copyright 2025 Kurt Cagle/The Cagle Report In my last post, I talked about ontologies as language toolkits, but I'm…

    53 条评论

社区洞察

其他会员也浏览了