ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

Superpowers of Knowledge Graphs, part 2: Data Aggregation

Mike Dillinger, PhD

å‘å¸ƒæ—¥æœŸ: 2023å¹´7æœˆ21æ—¥

Knowledge graphs have superpowers above and beyond those of ontologies, taxonomies, and mere databases. They can't jump tall buildings in a single bound, but they do have something akin to x-ray vision and other powers. We talked about one such superpower in a previous post:? data integration â€“ across many sources (and in a later post about taming LLMs).

Another key superpower of knowledge graphs is data aggregation â€“ across many instances.

Imagine you have millions or billions (or more!) of rows of data, each of which has many thousands of columns.? Just as we can't see x-rays or infra-red energy, hear ultra-high frequency sounds, or visualize thousand-dimensional spaces, humans simply cannot identify or compare patterns in such gigantic data sets.? We have to aggregate this data to make it useful and valuable â€“ which is why the dark arts of Data Science are so essential today.

Takeaways for the hurried and harried

Choose your knowledge architecture carefully to enable the data aggregations that you need.

Ontologies offer little or no aggregation of individual instances:? they focus on categories and often don't include instances, so they can't directly map instances to categories. Ontological categories, though, are very useful as targets for aggregation. Relating dogs to cats: fine. Dogs to letter carriers: fine. Fido to dogs: nope. Fido to LunaTheCat: nope.
Taxonomies offer very limited, lossy data aggregation:? instances can be aggregated with only one relation (instanceOf) into only one parent category (per the messy MECE principle).? Taxonomies only relate things to similar things:? dogs to dogs, flowers to flowers, atoms to atoms. Relating dogs to fleas: nope. Dogs to letter carriers: nope. Dogs to cats: nope. Fido to LunaTheCat: nope. Aggregation with taxonomies yields labels.
Knowledge graphs enable robust, flexible, diverse, and less lossy aggregation:? they take taxonomies and ontologies to the next level.? Graphs include both individuals and categories and enable us to relate things to both similar and different things when we aggregate. Relating dogs to cats: fine. Fido to dogs: fine. Dogs to fleas: fine. Dogs to letter carriers: fine. Fido to LunaTheCat: fine. Aggregation with knowledge graphs yields more informative, expandable subgraphs.

Foundations of Data Aggregation

Data aggregation is based on the fundamental distinction between individual instances (like Jack Nicholson or the Eiffel Tower) and collections of these instances (variously called categories, classes, kinds, types, etc.) like people, actors, teams, etc. This distinction is so important for human understanding that it shows up in many different forms across basically all domains:? item vs group, specimen vs species, reference vs sense, episodic memory vs semantic memory, token vs type, member vs team, particulars vs universals, and (in math) scalar value vs distribution, tuple vs relation, and element vs set.?

Data aggregation boils down to summarizing the features of a collection or category based on the features of the individuals in it.

Simply stated, an individual instance is unique: its cardinality is 1 and only 1. There's only one Jack Nicholson and only one Eiffel Tower; they are unique instances of categories like actors or buildings. Measurement theory tells us that a single data point is the unique result of measuring some predicate (property, attribute, characteristic) in some unique instance at a specific time and place, with a specific method â€“ recorded as a fact. So "data", technically, is a collection of facts with values for specific predicates about individual instances â€“ data is a collection of unique measurements, the input for data aggregation.? Ontologies downplay or ignore instances entirely, so they are most useful in getting ready for data aggregation â€“ by defining the target categories that can guide the process.? Not all taxonomies include instances (the ones without instances are sometimes called typologies), but when they do they provide an important though very limited and lossy method for data aggregation â€“ by relating each instance to a single category. Knowledge graphs can be seen as a significant extension to both ontologies and taxonomies:? knowledge graphs often emphasize instances and diversify data aggregation by enabling each instance to relate to a wide range of categories through a range of different relations.

A collection or category (I'll call them categories here), on the other hand, can have any cardinality, including 0 (we can even imagine and define a category that has no instances, like Martian actors). The output of data aggregation is values for features of categories. But we can't actually measure values for features of categories â€“ we can only compute, predict, or infer them, based on the measured values for the instances in the category. In fact, basic statistics tells us that features of categories have different kinds of values from features of instances:? because they denote categories, the values of category features need to be descriptors of distributions, such as measures of central tendency, dispersion, skewness etc.? Describing a distribution for a category feature with a default value or only its mean is a dramatic oversimplification that unintentionally increases the noise in the data â€“ it's essentially self-inflicted error.? It's also a simplification that is entirely inconsistent with logic, which assumes that an assertion like "adult male humans are 170 cm tall" is true for all adult male humans, even though means are, by definition, not true values for most of the instances in a given category.

Methods of Data Aggregation

It's helpful to think of two distinct families of data aggregation methods:? quantitative aggregation and conceptual aggregation.??

é¢†è‹±æŽ¨è

How the data science profession is growing in value and impact across the business world

How the data science profession is growing in valueâ€¦

Dana Gardner 6 å¹´å‰

Mark your calendar: MORE LIVE SHOWS AHEAD!

Maven Analytics 9 ä¸ªæœˆå‰

Meet Ilya Yakovlev, Head of Data & Analytics

Stonegate Group 2 å¹´å‰

The most familiar quantitative methods for data aggregation are the descriptive statistical methods for comparing multiple instance feature values in terms of a single dimension, by computing a mean, standard deviation, kurtosis, etc. (i.e., category features) for those values â€“ these are kinds of aggregation by computation. When we measure temperature, we map numbers to the feature, as in 32 degrees Celsius. If we take the temperature of 100 instances of a category, we can describe the temperature of the category by saying that the category feature's mean is 31.2 degrees Celsius and the category feature's standard deviation is 1.7 degrees. Describing categories with distributions gives us additional information about how typical the values for instance features are â€“ which is useful for spotting exceptions, anomalies, and errors.

We often generalize this approach to multiple dimensions by computing n means, standard deviations, etc. for the same category or cluster of instances â€“ using multidimensional techniques of machine learning such as clustering and classifier algorithms: a centroid in clustering is essentially an array of means of many features and it describes the "center" of a category of instances. Classifiers aggregate values by trying to determine the optimal boundaries (rather than the center) of the category.??

Less familiar are the conceptual methods for aggregating instance features with non-numeric values, such as terms.? When we "measure" or provide values for a feature like color, we use strings as values for the feature, like "green".? When we "measure" the color of 100 similar objects, say apples, how can we find something like a mean? How can we describe the distribution of their values?? One common way is to convert these non-numeric values into numbers and to tally the frequencies of the distinct values â€“ regardless of how similar (or dissimilar) those values are in qualitative terms â€“ and aggregate them statistically, as described above. (This ignores the conceptual difficulty of deciding which values we consider to be "the same" when we count them, since counting is categorization in disguise).

Another way is to aggregate them conceptually â€“ in terms of what they mean:? group and count the greens, the reds, the other color values, for example. In a very useful sense, a parent term like "green" represents something like a "mean" concept that captures the commonalities of chartreuse, bottle green, kelly green, forest green, lime green, etc. (and more shades of green).?

We've seen that ontologies usually don't include enough information about instances for this kind of aggregation by lookup in which we store instance/category pairs as facts in lists, tables, trees, graphs, or other formats.??

Taxonomies often do include instances and are the workhorses of conceptual data aggregation, in fact they seem to have been invented for the specific purpose of aggregating specimens (instances) into species (categories) â€“ aka classification â€“ a clear example of aggregation by lookup. One key constraint, however, is that in taxonomies instances can be aggregated with only one relation (instanceOf) into only one parent category (per the messy MECE principle). In practice, though, taxonomy work is very difficult because instances can be aggregated in many, many possible ways, no single one of which is "best" for all use cases or can capture different perspectives or points of view. So forcing a single aggregation creates a single simplified representation that is very lossy and very difficult to build.??

Knowledge graphs take conceptual data aggregation to the next level.

Knowledge graphs are not hobbled by the single-parent constraint common in taxonomies and some ontologies, so the same instance can be aggregated in many different ways. Even the much rarer poly-hierarchical taxonomies in which multiple parent categories are permitted still aggregate instances using only one type of relation (instanceOf). Knowledge graphs, on the other hand, can and often do include hundreds of instance-category relations, making it possible to aggregate the same instances in many, many ways â€“ rather than just one.? This dramatically reduces the lossiness that characterizes taxonomic representations because knowledge graphs can capture more and more varied information, adapt to more use cases, and encode more different user perspectives.??

There are dramatic differences, too, in the information that data aggregation yields.? When we aggregate by lookup in a taxonomy, for each instance the lookup returns a parent category label, sometimes with a human-readable definition from an external data dictionary, plus a label for the grandparent category.? Category labels indicate that instances of a category are similar but do not document why (so we can't validate them at scale) â€“ I call this similarity by decree. All the downstream algorithms have to work with is labels.

When we aggregate by lookup in a knowledge graph, for each instance the lookup returns a category label which is a pointer to a subgraph:? i.e., to all of the triples in the graph that have this category as a subject. Category labels again indicate that instances of a category are similar and the linked subgraph documents why they are considered similar â€“ I call this similarity by description. This means much, much more information for the downstream algorithms to work with: rich subgraphs that can be expanded for more details as necessary.

Which specific characteristics give knowledge graphs their superpower for data aggregation?

Knowledge graphs return rich subgraphs as the output of data aggregation, not just labels.?
Knowledge graphs include both individual instance and category nodes, as well as literal values, unlike ontologies, which usually include only category or class nodes.?
Knowledge graphs include many kinds of instances and categories, unlike taxonomies which usually focus only on the one type of entity described by the root node.
Knowledge graphs include many kinds of predicates, unlike taxonomies which usually include only instanceOf and subcategoryOf predicates.
Knowledge graphs don't rely on additional external resources like data dictionaries or data catalogs (which are very often missing) to define instances and categories.
Knowledge graphs make instance and category definitions machine readable, not only human-readable, unlike taxonomies.
Knowledge graphs define instances and categories explicitly in terms of other defined entities â€“ a kind of cognitive recursion, unlike other resources that define concepts in terms of only-human-interpretable strings.

Knowledge graphs have other superpowers, too. Which of them have you seen in practice?

Cedric Signori

CMO | Transformational Marketer | AI & Data-Driven Growth

1 å¹´

Thank you for sharing your Knowledge about knowledge :) ! Knowledge graphs are indeed superpowered tools in the world of data management and AI. Their ability to aggregate data across numerous instances provides invaluable insights and context. It's incredible how knowledge graphs can help your AI systems and data-driven processes soar to new heights by efficiently aggregating data from diverse sources. ?? This power is crucial for making well-informed decisions and uncovering hidden patterns within complex data ecosystems. #dataaggregation #knowledgegraphs #knowledgemanagement #ontology #taxonomy #ai #artificialintelligence ????#EquitusAI

èµž

å›žå¤

1 æ¬¡å›žåº”

Roy Roebuck

Holistic Management Analysis and Knowledge Representation (Ontology, Taxonomy, Knowledge Graph, Thesaurus/Translator) for Enterprise Architecture, Business Architecture, Zero Trust, Supply Chain, and ML/AI foundation.

1 å¹´

I agree with this posting. Again, this aligns with my own KG approach, as described in my comment to your Part 1.

èµž

å›žå¤

2 æ¬¡å›žåº”

Mark Spivey

Helping us all "Figure It Out" (Explore, Describe, Explain), many Differentiations + Integrations at any time .

1 å¹´

people mostly use these words in varying ways to make specific points â€¦ ironically there is very little consistency in the definitions of knowledge graphs, ontologies, taxonomies, databases, etc â€¦

èµž

å›žå¤

2 æ¬¡å›žåº”

Maciej Teska

CEO of Synergy Codes | 170+ visual solutions | data visualization & diagramming 3x faster

1 å¹´

Absolutely true! Knowledge graphs are like the superheroes of data management, enabling seamless integration and aggregation of diverse data sources, paving the way for enhanced data insights and decision-making.

èµž

å›žå¤

2 æ¬¡å›žåº”

Ashleigh N. Faith

Knowledge Graph, Semantic Search, & MLAI

1 å¹´

Ha, reading quickly while scrolling I thought this said data agitation ??

èµž

å›žå¤

2 æ¬¡å›žåº”

æŸ¥çœ‹æ›´å¤šè¯„è®º

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Mike Dillinger, PhDçš„æ›´å¤šæ–‡ç«

Knowledge Graphs: Artificial Knowledge for Artificial Intelligence

2025å¹´3æœˆ14æ—¥

Knowledge Graphs: Artificial Knowledge for Artificial Intelligence

Intelligence is simply being good at thinking: at using what you know to make sense of what you don't. That might beâ€¦

59 æ¡è¯„è®º
Herding, Culling, and Caging Predicates for Knowledge Graph Relations

2025å¹´2æœˆ7æ—¥

Herding, Culling, and Caging Predicates for Knowledge Graph Relations

Lists and bags and sets are jumbles of items that I find aberrant and abhorrent. So when I see people blithely inventâ€¦

14 æ¡è¯„è®º
Diversity, Depth, and Density of Knowledge Graph Relations

2025å¹´1æœˆ13æ—¥

Diversity, Depth, and Density of Knowledge Graph Relations

At the top of my list of New Yearâ€™s resolutions this year is relation resolution. In the world of structured knowledge,â€¦

15 æ¡è¯„è®º
New Year's Resolutions for your Knowledge Graphs

2024å¹´12æœˆ23æ—¥

New Year's Resolutions for your Knowledge Graphs

As you enjoy your holiday season, I suggest two resolutions to consider for the New Year: entity resolution andâ€¦

18 æ¡è¯„è®º
Knowledge Graphs and Monkey Business with Generative AI

2024å¹´12æœˆ9æ—¥

Knowledge Graphs and Monkey Business with Generative AI

Throughout the year I got poked and prodded and challenged in a bunch of different ways by my friends, colleagues, andâ€¦

7 æ¡è¯„è®º
Thanks for them Knowledge Graphs

2024å¹´11æœˆ28æ—¥

Thanks for them Knowledge Graphs

It's Thanksgiving Day here in the US. A time to count one's blessings.

10 æ¡è¯„è®º
Knowledge Graphs are Essential for Safe AI

2024å¹´11æœˆ11æ—¥

Knowledge Graphs are Essential for Safe AI

AIs will only be safe for general use when they have and use goals and values that are identical to those of humans. Inâ€¦

30 æ¡è¯„è®º
Knowledge graphs, Linguists, and the Last-mile problem of AI

2024å¹´11æœˆ4æ—¥

Knowledge graphs, Linguists, and the Last-mile problem of AI

Now that AI can generate fluent text at scale in multiple languages and different styles, are authors, translatorsâ€¦

22 æ¡è¯„è®º
Audio: How to make AI safe and reliable?

2024å¹´10æœˆ21æ—¥

Audio: How to make AI safe and reliable?

Janie and Johnny are back for Episode 2 of my Byte-sized AI series! Listen in to these engaging, bite-sized podcasts toâ€¦
Audio: What are Knowledge Graphs?

2024å¹´10æœˆ1æ—¥

Audio: What are Knowledge Graphs?

Who knew? It seems that Max Headroom had blue-eyed twins and they're all grown up! I suspect that he sent them toâ€¦

10 æ¡è¯„è®º

See all articles

Superpowers of Knowledge Graphs, part 2: Data Aggregation

Mike Dillinger, PhD

Takeaways for the hurried and harried

Foundations of Data Aggregation

Methods of Data Aggregation

é¢†è‹±æŽ¨è

Mike Dillinger, PhDçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Driven by Data | The Newsletter, Edition 66

the next #5: Video Interview: Data Culture & Data Literacy with Nadja Sch?fer, Global Lead Data Culture @ Roche

Bad-Viz: The Silent Killer of Data Science Careers - The Shocking Truth About How Poor Data Visualisation "Hurts".

August Data News

Connecting Data, People and Ideas in 2024 and beyond

Demystifying Data Science

Data Scientists: Stop Wasting Time - Ask These Five Questions First!

The 5 Most Popular Data Insights in 2024

FAIR by Design Data Landscape: Gaining Insights for Life Sciences

What Is Big Data?

Takeaways for the hurried and harried

Foundations of Data Aggregation

Methods of Data Aggregation

é¢†è‹±æŽ¨è

Mike Dillinger, PhDçš„æ›´å¤šæ–‡ç«

Knowledge Graphs: Artificial Knowledge for Artificial Intelligence

Herding, Culling, and Caging Predicates for Knowledge Graph Relations

Diversity, Depth, and Density of Knowledge Graph Relations

New Year's Resolutions for your Knowledge Graphs

Knowledge Graphs and Monkey Business with Generative AI

Thanks for them Knowledge Graphs

Knowledge Graphs are Essential for Safe AI

Knowledge graphs, Linguists, and the Last-mile problem of AI

Audio: How to make AI safe and reliable?

Audio: What are Knowledge Graphs?

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Driven by Data | The Newsletter, Edition 66

the next #5: Video Interview: Data Culture & Data Literacy with Nadja Sch?fer, Global Lead Data Culture @ Roche

Bad-Viz: The Silent Killer of Data Science Careers - The Shocking Truth About How Poor Data Visualisation "Hurts".

August Data News

Connecting Data, People and Ideas in 2024 and beyond

Demystifying Data Science

Data Scientists: Stop Wasting Time - Ask These Five Questions First!

The 5 Most Popular Data Insights in 2024

FAIR by Design Data Landscape: Gaining Insights for Life Sciences

What Is Big Data?

é¢†è‹±æŽ¨è

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†