Index me this
Sherlock Holmes, Granada Television

Index me this

Let's play a game. I'll offer a word and you reply with the first word that comes to your mind when you hear my word. Ready?

Hot.

Cold.

King.

Index.

What was the first word that came to your mind when you saw the word "Index"? Was it "finger"? I hope it was, and continue with my story. The Latin word index means "one who points out" and it is very natural for us to point at things using our index finger. The word index is used to point to location of things in space. Just as a map helps us navigate to a physical location, the index helps us navigate to a location in a text or book or corpus. When we think of indexes the picture that comes to mind is a series of alphabetically ordered entries at the rear end of a book. However it took centuries before index appeared at the end of books.

The first problem preventing indexes from appearing in books or manuscripts was structural. Early books (or texts or manuscripts) were scrolls. Scrolls have no page numbers, which take away one half of the information that an index is supposed to contain - the location of the word in the text.

In time, scrolls morphed to codexes and codexes began to have numbered pages. Even then, codexes copied by hand meant that the page number in one version differed from the page number in the copy, thereby making the index difficult to maintain across copies.

Printed books brought in standardization in page numbers, and paved the way for indexes that could remain correct across the many copies of the book.

The early form of indexes are what we moderns would call Table of Contents today. They lit up a path that lay ahead for the reader in the order that the topics or concepts that were going to be encountered.

At some point in this journey, Concordances began appearing in books. Concordances are not indexes. They are the list of all (non-trivial) words that have appeared in the text.

Finally, we see indexes appear, somewhere in the 1500s. A list of words, phrases, concepts in alphabetical order, with locations where these occur in the text. Indexing became a profession, and the Society of Indexers was founded to frame and maintain rules for indexing. Indexes became rich with meaning, serving as a guide and companion to the text. They even appeared in popular culture. The famous fictional detective Sherlock Holmes is often described as spending his down time revising and updating his massive indexes of famous criminals, adding information and cross referencing entries.

Fast forward to modern times. We now use software to write books and there's obviously software to generate an index out of a body of text. However, this produces a less rich or lower quality index. It is mechanical and misses nuances. To illustrate the difference the author of the book "Index, The History of The" by Dennis Duncan provides two indexes. One is software generated. The other is generated by a human indexer and is warm, opinionated and serves as a guide to the book.

In indexes in books inspired the creation of database indexes in computing. A database index helps retrieve information from the database faster by providing an additional data structure that contains entries which have the keys of data in an ordered form and values pointing to the location where the value of the key can be found. In some cases, the value is the database record itself, in which case the index is called a Clustered Index.

The database storage engine dictates the type of index data structure used. Broadly there are two types of storage engines - The Page based and the Log storage based.

Most relational databases use Page based storage engines, where the databases is divided into fixed sized pages or blocks that mirror the blocks on the disk. These storage engines rely on B Tree or B+ Tree based indexes, where each of the nodes of the tree is a page and can contain a pointer to data stored on other pages.

NoSQL databases used Log storage engines, where the database is divided into variable sized segments. Some engines use a specific type of these called Sorted String Tables, or SSTables. These engines rely on a sparse in-memory index data structure called Mem Table.

Each storage engine and its companion index brings in tradeoffs in read and write performance. The log storage engine, with its in-memory index support fast writes. The page based storage engine with its disk based B-Tree backed index needs time to write into the index, so writes are slower but reads are efficient.

Indexes are also present outside of the database world. We have Search Indexes where the index points to documents where a word or phrase occurs. These power your favourite search engines. The search index is a slightly different beast, and deserves a deep dive of its own.

The world of search indexes has a new entrant, indexes that use word embeddings for a more semantic result rather than a purely frequency-based answer that the older approaches used. Five hundred years since they made an appearance, indexes are still alive and well!


Shamit Bagchi

Marketing & Strategy, AI/ML/Generative AI | MBA - IIMB and MSc Applied Physics, Chalmers Sweden

2 周

Good one! I remember Oxford Classics -Return of Sherlock Holmes - it had an index quite fabulous and detailed as it gets, including anecdotes and excerpts from references etc!

回复
Manimaran Manivannan

Trusted Tech Leader | Delivering From Vision to Execution | Data, Cloud & AI Strategist | Driving Impact Without Micromanagement | Teacher

1 个月

There has not been a single job role I have done in my career where I've not had to deal with indexes. It's more like the unsung hero of digital world we live in. I also remember ?? teaching me about B Trees when we studied together.

回复

要查看或添加评论,请登录

Rick Banerjee的更多文章

社区洞察

其他会员也浏览了