Explaining NoSQL to Normal People

Explaining NoSQL to Normal People

Today, I had the chance to sit down with a tech-savvy executive from a highly successful Dallas-based company. As our conversation veered into a flurry of technical jargon, they paused and asked, "So how can you have NoSQL?" It's a question I've encountered before, and depending on who I'm speaking with, I have a variety of answers.

One of my go-to examples is LDAP (Lightweight Directory Access Protocol). While not the first NoSQL system, those of us who were around during the Y2K panic will recognize it as an early form of NoSQL. LDAP organizes data hierarchically, using a schema-less structure that’s characteristic of NoSQL databases. This was the example I used during our conversation today, but as is often the case, I thought of better examples afterward that might resonate more widely.

Take DNS (Domain Name System), for example. To me, DNS represents one of the earliest forms of NoSQL databases. When you enter a URL like www.domainname.suffix, the DNS server retrieves the associated record to route your request, whether it's for a website, email, FTP, or another service. The fact that DNS is still fundamental to the internet today is a testament to the durability and effectiveness of NoSQL systems when used in the right context. I once worked on an innovative project where we used DNS not just for name resolution but as a backend database. By embedding custom data into DNS entries, we bypassed traditional databases altogether, distributing our application data globally via DNS infrastructure.

Another example lies within your computer's file system—NTFS, FAT, FAT32. These systems function as NoSQL storage structures. When you retrieve a file from your hard drive, you’re using unique identifiers rather than structured query language to locate and access the data.

So, if NoSQL is so ubiquitous and has been around for so long, what's the big deal? Why is it showing up all over as if it's the next great thing? To understand why NoSQL is so hot, you have to understand SQL. SQL and its underlying relational database technology served a very specific purpose for many years: making large amounts of data available as fast as possible while using as few resources as possible. To accomplish this, you paid highly skilled people lots of money to "normalize" the data. We had explicit rules we followed to store large amounts of data into the smallest space. This was done because while the people doing the normalization were expensive, the hardware, memory, and bandwidth it cost if you didn't do it were far more expensive.

Things, though, have changed in the last few years, and that shift is driving a change. In fact, it's a change long overdue, which is why it feels so much like we just discovered a new land that we are all rushing to. The first and what should have caused the shift a decade or more ago is the declining cost of storage, memory, and bandwidth. To some extent, this has taken effect. In the mid-90s, I can remember having conversations about the "Name" field and whether having multiple "Davids" justified the decomposition of the "Name" field into "First" and "Last" with a foreign key to the names table. Should "Address" be decomposed and stored as "Number" and "Street" so we could normalize the address information? Fortunately, that is a thing of the past—we just save the "Address" as "Address" and "Name" as "Name." If we bother to do a "States" table, we're overachieving.

But the true shift came with mobile applications and the need to store data closer to the device. The biggest issue with the relational databases that SQL relies on is the presumption that there is only one massive datastore, so you only have one copy of the data to save on memory and space. The moment you start making copies of the database, you're undermining the whole premise that normalization was built on. You do it a couple of times and start wondering if there isn't a better way. At this point, duplicating some of the information lower down starts to look better than duplicating entire databases. Patterns and processes start to emerge, and servers and technology are created to support it.

Finally, the last driver is the sheer amount of data we are dealing with. Ten years ago, the useful data threshold was based on the ability of a human to draw value from it. Granted, that can still be a huge amount of data, but it doesn't compare to the amount of data AI can draw value from. As we move forward, we need data systems that can scale beyond what we can begin to imagine.

With SQL and RDBMS (Relational Database Management Systems), we came up with the answer. With NoSQL, we are preparing to store the data to figure out the question.

Anton Gerbracht

Principal/Solutions Architect/Technology Advocate at CGInfinity

1 个月

The question of which database is an implementation detail best left to the enterprise architect. Executives should set the overall strategy and let the team make the tactical decisions. An officer in the military shouldn't tell the NCO how to erect a flag pole, just tell the NCO what needs to be done and let them do their job.

Seth Horowitz

Product Management and leader of cross functional teams. Travel Product and Technology Strategy Advisor. Non Profit Board member, Adaptive Sports facilitator. My core values are Connection, Compassion and Adventure

1 个月

Brilliant David, even I was able to track with that explanation. Thank you.

回复

要查看或添加评论,请登录

David Strickland的更多文章

社区洞察

其他会员也浏览了