Big Data Puts DBAs in a Vice: A New Approach Emerges
Shortly after Y2K (remember that?) the industry focus shifted to the challenge of the explosive growth in the size of datasets. The slipping latency of relational systems combined with demands for faster turnaround brings the second crisis.The problem was initially addressed as a file system issue and the need to find a way to handle very large files across multiple disk volumes. This approach, a re-branded version of the file management ideas of the 1950s, has turned out to be a temporary fix. Managing these files and their related external indexes has now become impractical. It has led to a situation where data scientists spend 80% of their time in data prep rather than analysis. At the same time demand for faster turnaround of the analytics compounds the problem. One estimate suggests this will require 30,000 new data scientists in the next 5 years, at least an order of magnitude more than what’s likely.
Several unique challenges emerge in handling large datasets. These include 1.) How to stream new data, 2.) How to purge old data, 3.) How to provide backup security, 4.) How to make new data available for query.
In the current technology stack these data admin activities demand long downtime periods. In traditional systems we assume some modest downtime will be acceptable. In large databases the time needed for these tasks expands rapidly and soon becomes the primary bottleneck in quickly getting to the analysis. A key metric has become “time to insight.”
It gets more intense if our requirement includes high availability and large numbers of concurrent users in addition to quick turnaround. DBAs are increasingly caught in a vice. Ancelus has emerged to change the debate.
Ancelus indexing has been integrated into the core of the database kernel operators. It happens on every transaction and the impressive Ancelus streaming rate includes the time to index the new data. It’s not how fast you can stream, it’s how soon you can query. With Ancelus the new data is available in nanoseconds.
Removing or archiving old data from the live dataset is equally straightforward. A simple delete command removes a record from the database and instantly re-indexes the dataset. No downtime required. The combination of these two functions supports the concept of a pipeline database. Stream in, stream out at the same rate. It can be configured to hold a constant size (one transaction in, one out), or to retain a fixed time window (remove transactions older than xx days).
Backups to secure the data can be configured for real-time operation by activating the live journal function. This takes the primary backup and incrementally adds each transaction to the journal file. This allows a recovery to the last transaction should that ever be needed. The primary backup can also be streamed at high rate by using a memory copy (locked database) followed by a stream to disk (background task). The journal file keeps track of any transactions during these brief lock events.
The design objective of these features in Ancelus was to support the development of real-time applications. Extreme transactional performance with no downtime. But even if that isn’t the goal of the Big Data system, these tools can dramatically reduce the time spent in data prep. Shrink the “time-to-insight.”
To learn more about real-time and big data (rarely used in the same sentence), contact us at www.ancelus.com.
Freelance z/OS sysprog & more - the power of experience
5 年Well organized, indexed, repeatedly accessed (cache) data does not care much, when accessed randomly. So organize, index and maybe maintain multiple, synchronized COPIES for different uses.
Craig Mullins, President & Principal Consultant at Mullins Consulting, Inc. IBM Gold Consultant and IBM Champion for Data and AI
5 年DBAs are, indeed, faced with many challenges when it comes to managing and providing real-time access to big data... and there are many aspects to these challenges. Minimizing downtime is important, but so is maintaining high performance, as well as the ability to make any needed changes quickly and without impacting production workloads. Do you have any customer testimonials as to how Ancelus has helped them to overcomes these challenges?