What’s in a Line?
Christopher Westphal
???????????????? ???????????????????????? + ?????????????? ?????????????????? + ???????? ?????????????? = ????????????????
A line is very simple, basic, bounded, and easy to understand in the context of a link-chart or network diagram. It’s designed to connect two nodes together based on some condition contained in the underlying data. This got me to think how many dimensions are in a line? What data features can a line represent? What line options will convey more information?
If you look up the dimensions of a line, based on mathematical prose, it has an official value of “one” because only one coordinate is needed to specify a point on it. Clearly we could augment this with other dimensions such as color and thickness; but what other options can exploit a line?
Early in my career (over 30?years ago), I worked for a company that developed an analytical platform called NETMAP – a program that created a lot of lines. One day, on a business trip to England with my colleague David Snocken (https://www.dhirubhai.net/in/david-snocken-1275753/) we got around to discussing the relative merits of lines or links in the context of a network diagram and how much information could be transmitted using a line. And, could we adapt NETMAP to include some of these revelations?
The NETMAP system was initially designed to perform organizational analysis (ORGMAP) to compare the formal hierarchy of a company or division against the way it actually operates. The data was acquired from employee questionnaires that defined how often people communicated, how important were their communications, and what topics were discussed such as marketing, budgets, technology, etc. The platform eventually evolved to handle any type of data mappings, different display layouts, and a flexible API to customize the behavior.
The fundamental NETMAP functionality converted data into a numerical matrix of nodes and links and then overlaid a visual interface to show the network interconnections in what was often referred to as “wagon wheel” diagrams – a sample shown below.
Figure: NETMAP Chart
The NETMAP displays were somewhat untraditional as the content was ubiquitous. All data was mapped to qualifiers (attributes) and there was no concept such as “types” to differentiate a person or address or vehicle; there were merely different values in the same qualifier, that could represent a “type” or any other detail. The underlying matrix was the same for each node; it was the values assigned to these qualifiers that made them unique and the results came from how a user configured the various display parameters on these values.
There were several graph-display settings, besides the labels, that determined how the nodes and links would appear:
Nodes linked outside of their group setting (e.g., people->addresses) are INTER-group relationships. Nodes linked inside of the same group —people to people are INTRA-group relationships and shown using a satellite graph. The satellite represents linkages between nodes in the same group. Satellites are a good example of how NETMAP shows multiple sets of relationships in one context.
Figure: Example of NETMAP diagram with inter/intrs-group connections
To understand why lines are important, let’s say in the underlying data there were 7 qualifiers that include name, age, gender, address, city, state, and zipcode. When loading the data to construct the matrix, you’d naturally create a “type” qualifier where each node contained a unique key for their respective “name” or “address” values. Additionally, NETMAP automatically established a few system variables such as the link-count and a node-count. In this example there are a total of 10 (7+1+2) data-qualifiers to control and manipulate the displays. If we assume the labels are fixed once the node is created, that leaves 4 display controls for each node: group, order, width, and color.
The combination of these controls produces 10*10*10*10 = 10,000 possible arrangement of values for this particular set. Now, there will obviously be logical combinations that are more desired that others – such as:
But other combinations can show more specific results for targeted analyses, such as:
There were also controls for how the links are displayed among/between the nodes. Forgoing the label, each had a color, thickness, and style (dotted, dashed, solid). This provided 10*10*10=1,000 combinations with some more logical and useful than others. Additional line controls would increase the number of display options and provide more resolution for the analytics. This formed the basis for determining what other line characteristics could be considered.
Therefore, using NETMAP, a user had to think about what they wanted to see in their data and define the proper combinations of parameters to present the content in a way that made sense. Furthermore, interpreting the results then took additional effort to see the dependencies, correlations, and other interesting dynamics. It was this combination of factors that helped me master my understanding of data. NETMAP was very powerful, very logical, and very precise.
The canonical foundations for representing a line were formed during my time working with NETMAP. It was further enhanced when David O'Connor (https://www.dhirubhai.net/in/david-o-connor-ba55841/) and I created the VisuaLinks platform and included additional methods to convey content via linkages. Since then, other systems like KeyLines, Tom Sawyer, yFiles, GraphViz, and Gephi have streamlined and standardized a lot of these capabilities.
The following points overview of the most common line/link dimensions that are actively used in most commercial “link-chart” products or libraries. This is by no means a complete list, but one that answers my original postulate regarding the dimensionality of a line for analytics.
Graph analytics will continue to evolve with the innovation of new placement and display techniques; some useful and some impractical. Although 30 years has gone by, I still think about the best way to extract value from link charts, especially how the connections among the nodes are defined and presented. A single diagram might have a dozen (or more) dimensions shown and the goal is to present the most effective combination of values so the user can quickly interpret the results. All display settings should have intent with a specific purpose to promote usability. Therefore, I will continue to evaluate the line and figure out new ways to make it better.
Linux Systems Administrator at Johns Hopkins University Applied Physics Lab
2 年Looks familiar!
Jack-of-all-trades and master-of-some with 34+ years of mostly back-end software development/integration and DevOps; currently interested in *mostly* Rust, Go, Python, and Kubernetes related opportunities..
2 年NETMAP!? What's next? COBOL? ??
CEO / Owner at Pacific West Academy
2 年Old news but always good to bring up again remaind us all to always stay straight, honest and do it right!!! Good guys DO end up first!!!