登录查看更多内容

Hard Graph: Understanding Information Systems Today

Martin Nikel

Director, eDiscovery & Litigation Support | Thomas Murray

发布日期: 2019年4月6日

Visualising and describing my high-level perception and understanding of the complex IT systems of today.

For anyone to understand the world of Information Systems in the current era, is a stretch. Particularly when the engineers who create them often can't themselves comprehend the way in which results are produced by their own systems.

It's incredibly difficult even for the 'most intelligent of the geeks' to comprehend the complexity and intricacy of multi-faceted IT systems these days.

Before you read through, I have found the following video one of the better ones at explaining the high level categories of A.I.:

5. Abstract or conclusion

The human mind works almost entirely on the basis of abstraction or derivation. For instance we don't 'see' most of what we think we see. Most of our vision is probably an abstraction. What we see is made up of many things including memory, pre-conceived ideas, past experience and also what we are focused on and actually see.

Computing itself has always been based on abstraction. Abstraction is, in my opinion, the very reason we see such exponential change. The binary 1s and 0s being able to represent mathematics. Mathematics in turn just an abstract representation of the 'real world' to aid with problem solving.

We have developed a vast array of abstract notions to arrive at the 'narrow AI' capabilities that we have at our disposal today.

The pyramid represents my immediate recall (in a vague chronological order), and I may have missed out a few categories. If you can think of any, do let me know in the comments.

Significant abstractions in Information Systems

Of course a pyramid is a terrible representation of the complexities involved. Each area continues to develop daily and each discovery often has an exponential impact on each category above it. But I hope you get the picture.

The levels of devolution from the 'real world' are immense. Yet somehow we perceive a 'like' on the screen in just as real a sense as we perceive a 'cheers' at the pub. It feels 'tangible'. It may even mean something.

If modern IT systems are an abstraction of data and information, much like our own visual cortex is an abstraction of reality, what do they look like?

Here is a very, very simple approximation, from memory, of a single 'narrow' Artificially Intelligent system - a system designed for a specific limited task such as grouping textually similar contract clauses or aiding in the review of documents in a legal matter. Without making this an article referring to ‘'technology assisted review', the data-sources are, at a basics level, the legal document reviewer and the document.

4. Expanding Universe

Let's say that our system manages the millions of 'likes' that people post on LinkedIn or Facebook. The machine learning branch of AI is perhaps a simpler concept than Neural Networks. Machine learning systems can feed off multiple inputs and iteratively feed into further analytical and machine learning systems.

Multiple Machine Learning and analytical processes

The orchestration layer 'conducts' or 'controls' things. It might have the rules and parameters, the ratios and statistics, so it can help the system take the right actions at the right time.

To aid the reader: imagine this article you are reading. To get to the article, you might have liked something I liked or you 'followed' me. The machine learning in this scenario perhaps correctly assumed you'd therefore want to see more of my ramblings.

The Orchestration layer is told this by the 'like' and 'follow' system. The 'your feed' system is fed this information. The orchestration layer of the 'your feed' system, schedules a bunch of updates including my article on your feed. So this popped up in your feed the next day. Of course, adverts and videos might take precedence in the 'your feed system' as they are more commercially attractive. So you might never see my article. *sadface*

Within each action there are multiple systems and technologies at play, each one based of statistical assumptions and learned attributes, plus cold, hard data. To get to this point there are multiple models based on further nested models and layer upon layer of statistics derived from statistics.

3. Hard Graph

Let's take this one level further (up/down). The processes in social media around likes, comments, shares, connections, follows, posts, articles, messages are not quite 'serial' as we might think. Clicking ‘like’ on this article doesn’t just record something in a huge database on a computer under Jeff Weiner’s desk.

To be able to effectively manage and map, report and analyse, systems like LinkedIn go a few levels further with the abstraction. Typically using 'Graphs'. And I don't mean this:

Representations of social networks, or other networks of information, are a mathematical masterpiece of data representation and are commonly used as persistently stored 'in-memory' data structures, allowing the 'best-most-optimised' path to a result or conclusion.

In computer science, graphs are used to represent networks of communication, data organization, computational devices, the flow of computation, etc. For instance, the link structure of a website can be represented by a directed graph, in which the vertices represent web pages and directed edges represent links from one page to another. A similar approach can be taken to problems in social media, travel, biology, computer chip design, mapping the progression of neuro-degenerative diseases, and many other fields. The development of algorithms to handle graphs is therefore of major interest in computer science.

It's a bit wordy, (and cheating in my challenge), to refer to the Wikipedia definition. But then kids can take phones into their exams these days, I’m told.

The bits of information stored about you - let's call that your Account. If you look at all of the other Accounts that your Account has interacted with (through likes, share's comments, connections etc) you can conceptually build up a model a little like this, but on a much larger scale.

All of this abstraction provides immense value for the purposes of advertising on social networks. You can predict trends, preferences, even the likely sex of an individual (as LinkedIn does for all it's accounts). Though these graphs have vast additional applications in many other fields.

Social sciences, Biology, Mathematics, Physics etc. In fact, your typical GPS is a graph-based system behind the scenes, with weights applied to the links (edges) in a graph such as distance, travel-time and cost. Many of these graphs would be at a higher conceptual level and be nested - with 3 or more dimensions - and typically stored in fast access storage (computer based physical memory or super-fast disk based technology).

Imagine multiple levels of machine learning sitting under these ‘graphs’, running in near-real-time or scheduled.

Let's see what my abstraction looks like now.

2. Virtual Insanity

Still with me? I'm not getting into the complexities of the sub category of deep learning/ reinforced learning such as Alpha-Go, as it gets more into the modelling of the human brain and visual cortex, to provide autonomous or unattended learning. It's data structures become even more abstract and layered.

The basic principle is just like neurons in the brain. Re-enforcement through experience is learned, with each path becoming stronger and more certain with each outcome.

Each of the 'nodes' is helping to reinforce or refute a decision.

Commonly this type of learning is used for categorisation and classification, for example in facial and handwriting recognition. Such a system relies heavily on the speed of computing and in particular Graphical Processing Units (GPUs), because of the need for extreme speed to master the potential complexity. Let’s anyway add a visual representation of that at the higher level in our system.

1. Conclusion or Abstract

Something with a huge level of convergence and developing at a pace, that's not too abstract is the physical stuff. Phew.

Hard drives, processors, GPUs, memory etc. All have one thing common, they all store or manipulate data that can be found via a directory or address book of some kind, and perform mathematical manipulation of that data.

Let's keep it there. It's a little more complex these days. But put very simply, that's all they are.

As a idea of scale, we create 16.7 Zettabytes of data each year. We’ll soon be producing hundreds of Zettabytes in a year. It’s difficult to fathom. First because it’s only ever estimated at best and second because there’s a problem with the standard notation for numbers in a binary base.

But personally I use: 1 Zettabyte is equal to 1 trillion Gigabytes.

The technologies are all experiencing exponential progress, to keep pace with this acceleration in data and content creation.

As they develop they are converging into ever more efficient storage and processing. At the same time becoming more distributed and less ‘tangible’ as they disperse and multiply around the globe.

Unfortunately for our diagram that's where the simplicity ends. Distributed file systems, cloud architecture have or will still continue to accelerate the possibilities. The scale is insane.

Stepping up from my 10 MEGABYTE Winchester HDD in the late 80s, to memory scales for a standard public user of 4 TERABYTES and 160 CPUs plus per machine... and yes, that's a super fast 4 terabytes of memory, not disk... and that's just in a commercially available public cloud based system.

Today's stand-alone super-computers can have many PETABYTES of addressable super-fast memory. And many hundreds or thousands of PETABYTES of 'slower' more permanent storage.

These storage systems are usually highly available distributed filing systems, working in parallel for redundancy and performance.

Storage can now even optimise itself to respond to patterns associated with unstructured data being accessed randomly. It can distribute itself over various geographical locations, much like to often waffled about 'block-chain' decentralised ledger systems (Bitcoin). The application data might even be stored by more than one ‘cloud based’ storage provider.

Cloud is another (ridiculous) abstraction that simply means ‘lots of computers and storage’, typically in very secure data-centre locations.

The 'data' on which we wish to operate could come from, or be stored, ANYWHERE.

Alternatively data can simply be transient. It might exist for a specific purpose such as displaying the graphics on a video game, but it's not stored (perhaps MySpace was holding all my Karaoke MP3s in-memory. That's why they're all now fortunately lost).

The representation is possibly a single large system, or even one simple distributed application developed in a cloud based architecture. The sphere is supposed to represent two things. The global nature of distributed systems but also their increasingly nearing infinite nature.

These technologies are paving the way to quantum computing where storage and processing become simply a quantum level concept. Imagine the neural representations you could create then...

To understand this all a little better, you can (sort of) read the sections in the opposite order. That might make it more complex or easier... but here you have it. My minds-eye abstraction of A.I. and Information systems as I perceive them today. Au Naturel.

Disclaimer: This is all my own opinion and experience and isn't necessarily reflective of the views of my current or previous employers. It may have been written by a bot. You tell me?

About Martin: Over the past 17 years I've worked with Chief Legal Officers, General Counsel, Compliance Professionals and ‘Big Law’ firms globally, to create and implement systems and processes that reduce the likelihood of failure during a crisis.

Martin Nikel

Director, eDiscovery & Litigation Support | Thomas Murray

5 年

I think general AI is coming eventually Nikolai, and I think initiatives like the strategy discussions in the EU couldn’t come soo enough: https://lnkd.in/dt6Fe4h

1 次回应

Nikolai Pozdniakov

E-Discovery Production Manager at vdiscovery

5 年

One interesting subject to explore for future articles is when Machine Learning is being trained. Not everyone will train algorithm the same way so whom can you trust to do the training? This can lead to further discussion how there is no generic AI and it has to be industry and task specific. I am not versed in this area so will leave it to someone with more expertise.

1 次回应

Alex Smith

Global Search & AI Product Lead (Senior Director) at iManage | Godfather and Founder of #IAbeforeAI

5 年

Hard graphs ... love it!

2 次回应

查看更多评论

要查看或添加评论，请登录

Martin Nikel的更多文章

The eDiscovery Revolution You Can't Ignore

2024年2月28日

The eDiscovery Revolution You Can't Ignore

The workplace is buzzing with AI. Chatbots hold customer conversations, generative AI churns out reports, everything…

8 条评论
The Horizon event: unexpected lessons from the UK Post Office Scandal

2024年1月23日

The Horizon event: unexpected lessons from the UK Post Office Scandal

Each week, our Cyber Series will go behind the headlines to look in-depth at an issue that’s shaping our digital world.…

6 条评论
How to 'A.C.T.' in a Crisis

2020年4月17日

How to 'A.C.T.' in a Crisis

During my last years' attempts at writing, one article was harder to write than the others. It was called "I Didn't…

19 条评论
eDiscovery is Dead: Pandemic Edition

2020年3月28日

eDiscovery is Dead: Pandemic Edition

Just a few weeks ago, the eDiscovery world was talking about a different virus. Marketing commentary ranged from "This…

11 条评论
Five Tips for Selecting Operational Metrics

2020年2月23日

Five Tips for Selecting Operational Metrics

First of all, let's drop 'Legal Operations', 'Compliance Operations' or 'eDiscovery Operations' from the equation. It's…

2 条评论
eDiscovery Is Dead II: eDiscovery Reborn

2020年2月6日

eDiscovery Is Dead II: eDiscovery Reborn

Just a little over a year ago, I started writing on LinkedIn and one of the early popular articles was: eDiscovery is…

16 条评论
Apple Regulatory Requests

2020年1月13日

Apple Regulatory Requests

A very quick 2020 update. Given the recent interest in Apple Regulatory and legal requests, the FBI again asking Apple…
How To Review A Document

2019年12月17日

How To Review A Document

Dear Legal Document Reviewers, It's an arduous and all too often thankless task for you to mindlessly thumb through…

16 条评论
A Year of (e)Discovery on LinkedIn

2019年12月10日

A Year of (e)Discovery on LinkedIn

In 2019 I decided to start an experiment to see if LinkedIn could actually be used as a tool for business. I didn't…

36 条评论
eDiscovery Pet Peeve, #923

2019年11月25日

eDiscovery Pet Peeve, #923

There are so many things I wish I could change about the nature of work in eDiscovery and Litigation support. This…

28 条评论

See all articles

Hard Graph: Understanding Information Systems Today

Martin Nikel

Director, eDiscovery & Litigation Support | Thomas Murray

5. Abstract or conclusion

4. Expanding Universe

3. Hard Graph

2. Virtual Insanity

1. Conclusion or Abstract

Martin Nikel的更多文章

社区洞察

其他会员也浏览了

The Legacy of Algorithms: The Rise of Computer Science and AI

Allow mathematicians to pierce artificial intelligence frontiers*

Mathematical Foundations

Human Intelligence Versus Computational Intelligence

Algorithms - the dark beasts that can make AI turn on human kind

YOLO-NAS: 7 Factors to Success

HANDS' ON with Deepseek R1

The Architecture of Boltzmann Networks: From Statistical Physics to Modern Machine Learning

The Vision Transformer

Universal Science: Science + Technology + Culture: AI, ML, AGI, ASI = Trans-AI or Meta-AI

5. Abstract or conclusion

4. Expanding Universe

3. Hard Graph

2. Virtual Insanity

1. Conclusion or Abstract

Martin Nikel的更多文章

The eDiscovery Revolution You Can't Ignore

The Horizon event: unexpected lessons from the UK Post Office Scandal

How to 'A.C.T.' in a Crisis

eDiscovery is Dead: Pandemic Edition

Five Tips for Selecting Operational Metrics

eDiscovery Is Dead II: eDiscovery Reborn

Apple Regulatory Requests

How To Review A Document

A Year of (e)Discovery on LinkedIn

eDiscovery Pet Peeve, #923

社区洞察

其他会员也浏览了

The Legacy of Algorithms: The Rise of Computer Science and AI

Allow mathematicians to pierce artificial intelligence frontiers*

Mathematical Foundations

Human Intelligence Versus Computational Intelligence

Algorithms - the dark beasts that can make AI turn on human kind

YOLO-NAS: 7 Factors to Success

HANDS' ON with Deepseek R1

The Architecture of Boltzmann Networks: From Statistical Physics to Modern Machine Learning

The Vision Transformer

Universal Science: Science + Technology + Culture: AI, ML, AGI, ASI = Trans-AI or Meta-AI