登录查看更多内容

What’s in a Line?

Christopher Westphal

???????????????? ???????????????????????? + ?????????????? ?????????????????? + ???????? ?????????????? = ????????????????

发布日期: 2022年7月16日

A line is very simple, basic, bounded, and easy to understand in the context of a link-chart or network diagram. It’s designed to connect two nodes together based on some condition contained in the underlying data. This got me to think how many dimensions are in a line? What data features can a line represent? What line options will convey more information?

If you look up the dimensions of a line, based on mathematical prose, it has an official value of “one” because only one coordinate is needed to specify a point on it. Clearly we could augment this with other dimensions such as color and thickness; but what other options can exploit a line?

Early in my career (over 30?years ago), I worked for a company that developed an analytical platform called NETMAP – a program that created a lot of lines. One day, on a business trip to England with my colleague David Snocken (https://www.dhirubhai.net/in/david-snocken-1275753/) we got around to discussing the relative merits of lines or links in the context of a network diagram and how much information could be transmitted using a line. And, could we adapt NETMAP to include some of these revelations?

The NETMAP system was initially designed to perform organizational analysis (ORGMAP) to compare the formal hierarchy of a company or division against the way it actually operates. The data was acquired from employee questionnaires that defined how often people communicated, how important were their communications, and what topics were discussed such as marketing, budgets, technology, etc. The platform eventually evolved to handle any type of data mappings, different display layouts, and a flexible API to customize the behavior.

The fundamental NETMAP functionality converted data into a numerical matrix of nodes and links and then overlaid a visual interface to show the network interconnections in what was often referred to as “wagon wheel” diagrams – a sample shown below.

Figure: NETMAP Chart

The NETMAP displays were somewhat untraditional as the content was ubiquitous. All data was mapped to qualifiers (attributes) and there was no concept such as “types” to differentiate a person or address or vehicle; there were merely different values in the same qualifier, that could represent a “type” or any other detail. The underlying matrix was the same for each node; it was the values assigned to these qualifiers that made them unique and the results came from how a user configured the various display parameters on these values.

There were several graph-display settings, besides the labels, that determined how the nodes and links would appear:

Nodes: colors, width, order, group
Links: colors, thickness, style

Nodes linked outside of their group setting (e.g., people->addresses) are INTER-group relationships. Nodes linked inside of the same group —people to people are INTRA-group relationships and shown using a satellite graph. The satellite represents linkages between nodes in the same group. Satellites are a good example of how NETMAP shows multiple sets of relationships in one context.

Figure: Example of NETMAP diagram with inter/intrs-group connections

To understand why lines are important, let’s say in the underlying data there were 7 qualifiers that include name, age, gender, address, city, state, and zipcode. When loading the data to construct the matrix, you’d naturally create a “type” qualifier where each node contained a unique key for their respective “name” or “address” values. Additionally, NETMAP automatically established a few system variables such as the link-count and a node-count. In this example there are a total of 10 (7+1+2) data-qualifiers to control and manipulate the displays. If we assume the labels are fixed once the node is created, that leaves 4 display controls for each node: group, order, width, and color.

The combination of these controls produces 10*10*10*10 = 10,000 possible arrangement of values for this particular set. Now, there will obviously be logical combinations that are more desired that others – such as:

Group = type
Order = link-count
Width = link-count
Color = type

But other combinations can show more specific results for targeted analyses, such as:

Group = city
Order = age
Width = link-count
Color = gender

There were also controls for how the links are displayed among/between the nodes. Forgoing the label, each had a color, thickness, and style (dotted, dashed, solid). This provided 10*10*10=1,000 combinations with some more logical and useful than others. Additional line controls would increase the number of display options and provide more resolution for the analytics. This formed the basis for determining what other line characteristics could be considered.

Therefore, using NETMAP, a user had to think about what they wanted to see in their data and define the proper combinations of parameters to present the content in a way that made sense. Furthermore, interpreting the results then took additional effort to see the dependencies, correlations, and other interesting dynamics. It was this combination of factors that helped me master my understanding of data. NETMAP was very powerful, very logical, and very precise.

The canonical foundations for representing a line were formed during my time working with NETMAP. It was further enhanced when David O'Connor (https://www.dhirubhai.net/in/david-o-connor-ba55841/) and I created the VisuaLinks platform and included additional methods to convey content via linkages. Since then, other systems like KeyLines, Tom Sawyer, yFiles, GraphViz, and Gephi have streamlined and standardized a lot of these capabilities.

The following points overview of the most common line/link dimensions that are actively used in most commercial “link-chart” products or libraries. This is by no means a complete list, but one that answers my original postulate regarding the dimensionality of a line for analytics.

Color – is one of the most fundamental dimensions to help distinguish values assigned to a link. Often the type or category of a link (e.g., owner, witness, purchaser, filer, etc) is assigned as its color and does not change too often once set. Usually, color is effective when there are a finite number of values keeping the cardinality fairly low (e.g., less than 5 or 6 distinct values) so it is easy to differentiate them.
Thickness – is an effective dimension to show the count or frequency of a link that is sized relative to all the other links. Thicker links normally indicate the connected nodes have multiple linkages from different roles, multiple transaction counts, or repeated records. Additionally, they can reflect the aggregate for numerical values such as call-duration or dollar-amounts. The thickness of any line is relative to the values of the other lines, thereby always proportional and normalized across the entire range of values.
Style – helps to differentiate specific types of conditions for a link. The default is usually a solid line with options to include dashed, dotted, stylized (parentheses, tildes, hashes, stars, etc), jagged, or zigzag to emphasize a specific trait or condition associated with a link. For example, show dotted connections for implied linkages, dashed for unreliable data, or a specific style for outdated, invalid, or flagged for deletion. Styles are most effective for low variance situations and often specifically selected/assigned to particular value types.
Directionality – there is no up or down for a line but for connections that have an implied direction or flow, there are different ways to indicate this with arrow heads, markers, bars, spaces, breaks, dots, icons, or spheres. The distance from the end-of-a-line can also convey discrete category data like dates (recent/old), time (early/late), or distances (close/far). Multi-directional links (at both ends) is common, however it can be difficult to discern the relative number of from/to links (is it 1:10 or 5:5) that are supported in the data without drawing multiple lines or using other dimensions.
Taper – is a form of directionality combine with a thickness aspect. It is more pronounced than an arrow head and easier to see, especially on diagrams that have more content. It makes viewing network diagrams from an extended perspective more understandable.
Animation – helps convey directionality to show a flow in a specific direction like packets between servers in a network, money transfers between accounts, or flights between airports. The speed of the dots moving across a line can show specific frequencies or time periods. Although the animation is “cool” it can sometimes become distracting, especially if it only applies to certain types of connections and it does not transfer well to hardcopy or static output (also: 508 compliant).
Shape – a sphere, box, triangle, glyph, or an icon at the center-point of a link can represent a particular feature in the data like a note/comment, a connection from a specific source, an alias (e.g., similar entity connection), a user-defined relationship, or even a count for number of links represented. Furthermore, the size of this shape could reflect a dollar amount, call duration, link-count, or some other value.
Multiples – often pairs of nodes can have multiple connections and are frequently drawn using a thick line. However, drawing individual lines are useful where each link is represented separately in a stacked fashion between the nodes. It gets busy when there are more than, say 10 links, between any node pairs. Also, separating lines for from/to directionality delivers a cleaner diagram and could also be shown using thickness.
Geometry – incorporates any type of curve, straight, bend, angle, or self-connect linkage. This is done to help simplify the complexity of the network layout and minimize line crossings to make it easier to read. Although not typically used for conveying link-data, you could use a curved-line to represent a hidden-node is bypassed, like in a flow.
Length/Placement – the end-points for a line always connect to a node and each “xyz” placement tuple can be manipulated or the length of a line set using a selected data-value. However, most graph-drawing algorithms directly control the node-placement and line-length. But in “transactional” sources there are often one-to-many connection where this could apply – but again, the transaction-node would most likely dictate these factors.
Labels – are a key dimension for representing a link and help convey a lot of information about why two nodes are connected. The font and font-style can be individualize based on data-settings or done in a global sense to distinguish certain values. Also, link-labels can sometimes “clutter” a diagram and most systems have options to turn them on/off or have a balloon-help popup to show the value.

Graph analytics will continue to evolve with the innovation of new placement and display techniques; some useful and some impractical. Although 30 years has gone by, I still think about the best way to extract value from link charts, especially how the connections among the nodes are defined and presented. A single diagram might have a dozen (or more) dimensions shown and the goal is to present the most effective combination of values so the user can quickly interpret the results. All display settings should have intent with a specific purpose to promote usability. Therefore, I will continue to evaluate the line and figure out new ways to make it better.

David Titzer, RHCSA

Linux Systems Administrator at Johns Hopkins University Applied Physics Lab

2 年

Looks familiar!

1 次回应

A. Steven A.

Jack-of-all-trades and master-of-some with 34+ years of mostly back-end software development/integration and DevOps; currently interested in *mostly* Rust, Go, Python, and Kubernetes related opportunities..

2 年

NETMAP!? What's next? COBOL? ??

1 次回应

Oded K.

CEO / Owner at Pacific West Academy

2 年

Old news but always good to bring up again remaind us all to always stay straight, honest and do it right!!! Good guys DO end up first!!!

2 次回应

查看更多评论

要查看或添加评论，请登录

Christopher Westphal的更多文章

Entity Resolution: The Cornerstone of BSA Data Analysis

2025年1月20日

Entity Resolution: The Cornerstone of BSA Data Analysis

Entity Resolution (ER) is the process of establishing equivalence among data that refer to the same real-world entity…

2 条评论
GAGL - FLOCK TOGETHER

2022年7月10日

GAGL - FLOCK TOGETHER

Visiting a new city or out with friends on a Friday night – what if you could instantly find the hot restaurants and…
PPP Analytics Exposing Questionable Loan Patterns

2022年1月31日

PPP Analytics Exposing Questionable Loan Patterns

Overview This article provides a discussion and examples of the PPP Loan program and the application of analytic…
Actively Encoding Military Knowledge

2020年8月3日

Actively Encoding Military Knowledge

Analytical systems are designed to ingest large volumes of data, quickly filter results, and help produce quality…
Analyzing Opioid Abuse, Fusing Data, and Exposing Real World Patterns

2020年7月15日

Analyzing Opioid Abuse, Fusing Data, and Exposing Real World Patterns

Note: All results presented in this article are personal observations and interpretations based on the values contained…

6 条评论
Analyzing The Data: A Hypothetical Investigation Using DataWalk

2020年7月9日

Analyzing The Data: A Hypothetical Investigation Using DataWalk

DataWalk is a next-generation enterprise-class platform for revealing patterns, relationships, and anomalies for law…

1 条评论
Stop “Monkeying” Around With Your Analyses

2018年4月15日

Stop “Monkeying” Around With Your Analyses

For some, when tasked with writing a story and presented with a blank sheet of paper, they may feel intimidated or…

3 条评论
Next Level Analytics

2018年2月5日

Next Level Analytics

What’s the cure for cancer? How is the stock market going to perform? Who committed the crime? The answers to these…

8 条评论
Pssst, wanna know a secret???

2017年9月8日

Pssst, wanna know a secret???

Wait For It… Over the past several months, I’ve been working “under the hood” on the next generation “big data”…

7 条评论
New Beginnings...

2017年6月20日

New Beginnings...

As the old saying goes” “time flies when you're having fun” (or Tempus irreparabile fugit). It’s hard to believe 4…

19 条评论

See all articles

Christopher Westphal的更多文章

Entity Resolution: The Cornerstone of BSA Data Analysis

GAGL - FLOCK TOGETHER

PPP Analytics Exposing Questionable Loan Patterns

Actively Encoding Military Knowledge

Analyzing Opioid Abuse, Fusing Data, and Exposing Real World Patterns

Analyzing The Data: A Hypothetical Investigation Using DataWalk

Stop “Monkeying” Around With Your Analyses

Next Level Analytics

Pssst, wanna know a secret???

New Beginnings...