A World Without Oracle? An Escape Plan...

A World Without Oracle? An Escape Plan...

As many of you know I am an OpenStack guy. Live, breathe and sweat OpenStack. You can't shut me up. Open source makes all of the sense in the world when it fits your business and use case. But sometimes you need a commercial solution to get the functionality you need. It's just a fact of life.

And up until recently, Oracle was one of your few choices for enterprise ready databases. Things are changing and just maybe the way we treat and think about data needs to be rethought - as well as our choices for storing our data.

We live in interesting times when it comes to the world of information technology. The changes are coming so fast that it is hard to keep up. The constant is that company management continues to ask us to do more with less. As IT executives look across the various line items of their budgets their eyes are drawn to the big checks they write to companies like VMware, Cisco and the dreaded Oracle. Sorry my Oracle friends, everyone hates you – and you know it.

Oracle is deeply entrenched in many organizations and it takes Herculean efforts to replace them – even when you have a viable alternative. In fact, in some instances it just can’t be done. Additionally, in the past those viable alternatives have been hard to justify because the cost of the transition combined with licensing savings comes in about the same as the status quo.

So if there is no improvement in performance (or worse yet it performs slower) and it ends up being cost neutral to migrate why bother trying to replace it? Why take the risk? Oracle knows this and insures their pricing “leaves no money on the table” – or in your budget. And the Oracle Empire lives on…

That is unless of course, you have a new viable solution that has a killer business case while being a potential replacement for Oracle. Just like all of the solutions I get excited about, it is a forward thinking idea from a client that clues me on the up and coming technology. 

It’s More Than the Cost of Oracle

While it is easy to pick on Oracle and make them the target, the fact is the way we handle data today is very inefficient. If you take a step back and take a look at most company’s entire data environment - databases, data warehouse, business intelligence, analytics and data search infrastructure you realize that all of these segregated environments are all intertwined. Data from Oracle is ETL’d to your data warehouse and the copied to analytics, BI and search environments for further analysis.

Moving and keeping copies of data is expensive! You end up with tons of copies of the same data all sitting on your most expensive IT infrastructure. On top of that, you have many different databases products for different database solutions. It’s not unusual for large companies to have several versions of many flavors of databases. This includes Oracle, MS SQL, MySQL, Sybase, DB2, Hadoop, etc.

When combined, the cost of the databases, the tools, the infrastructure and all of the support personnel for all of the different products adds up to be a significant portion of an overall IT budget. And on top of that, your business is most likely complaining that access to data and analytics are still too slow!

How Did I Get Here?

"And you may ask yourself  How do I work this?  And you may ask yourself  Where is that large automobile?  And you may tell yourself  This is not my beautiful house  And you may tell yourself  This is not my beautiful wife"

We got here because of performance and not because of the Talking Heads. Traditional RDBMS’s have evolved in a way that we cannot run BI, analytics or search against our production databases. The performance impact to our production database would be unacceptable. Hence, the need for ETL (extract, translate, load) systems to move copies of the data into other environments.

Our ETL jobs usually run at non-peak times as they can tax system performance too. In some instances we mirror production to a second system (creating one more copy of the data and doubling our infrastructure costs) and ETL from the copy to our other systems. We are always working with older data as we have to wait for a window of opportunity to transfer plus the time it takes to actually copy the data.

It would be an interesting exercise for any company to perform a cost analysis of all data related systems, software and labor for the above environments. Even in a small organization it can get to be a large number very fast.

What If?

What if there were a way to get fast database access for all data types and at the same time run BI, analytics and search in the same environment? By running in the same environment you could eliminate ETL tools, BI systems, analytics systems, search tools and the data warehouse.

While getting rid of Oracle and your other databases how about getting rid of Teradata, Exadata, Informatica, Data Stage, Greenplumb, SAS, DB2, etc.?

First off, this solution will have to have incredible performance and be built to scale linearly. It should leverage affordable enterprise ready commodity hardware. More server nodes means more database storage and more performance. It has to be fault tolerant - losing a server node should not impact functionality or performance. It has to have incredible functionality to handle all of the use cases for accessing data.

There is such a product and it called Marklogic NoSQL – “Not Only SQL”. It is a commercial product and differentiates itself from open source solutions in that it is more robust and designed for database consistency unlike some NoSQL solutions like Cassandra (eventual consistency).

MarkLogic uses documents, graphs and in memory maps (like a key-value store). This allows Marklogic to scale out. It creates “views” to simulate relational databases to allow SQL API access. But you can’t do “joins” on MarkLogic or any RDBMS at scale since that requires data to be moved between nodes. There are methods to get to data quickly without the use of "joins" but that is a detailed discussion best reserved for each use case.

Some thought has to go into which pattern is appropriate for which data. The objective is to find which will allow for the fastest access to the data.

Using Marklogic helps eliminate many highly specialized databases, each with their own strengths and weakness. Expensive ETL (batch and real-time) to keep databases in sync goes away. Today, there is little understanding of the costs of moving data between systems because we have had no viable options so the analysis would be mute.

With a scale-out high performance option it makes sense to go back and examine the total cost of your data environment.

Total cost = RDBMS + ETL + DW + BI + Analytics + Search + Infrastructure + Labor

Vs.

Total cost = Marklogic + Infrastructure + Labor

Most likely you’ll find some gold in “them thar hills”! The analysis’ we have performed show a staggering cost savings. With Marklogic your environment becomes more streamlined, data is immediately accessible and available in real-time and costs are slashed. You are doing more with less. 

In addition, Marklogic leverages enterprise ready commodity hardware that scales horizontally across many server nodes. We recommend automated deployment through open source OpenStack private cloud. New server nodes can be brought online in less than 30 minutes. Marklogic on enterprise ready commodity hardware will eliminate expensive proprietary hardware, expensive SAN storage, fiber channel networks, backup software (it has its own backup built into the software) and many other infrastructure components.

Database, ETL, BI and Analytics labor are some of the most expensive resources in your organization. A Marklogic deployment would allow you to consolidate and most likely reduce the headcount in your data related environments. Not everyone will be happy to hear this, but if your mantra is do more with less, here is your opportunity.

Marklogic is a single database for:

  • All transactions
  • All Search
  • All Analytics

Marklogic uses a scale out architecture with ACID (Atomicity, Consistency, Isolation, Durability) transactions. Marklogic indexes everything on ingest for fast queries, search and deep analytics. You avoid the time penalty, resources required and costs of moving data around from system to system.

Each Marklogic server node adds CPU, memory, network connectivity and database storage to the Marklgoic cluster. Marklogic writes data to separate ”availability zones” on separate physical nodes. For example, your Marklogic cluster may span several racks with each rack be a separate “availability zone”. Data will never be stored on a Marklogic server node in the same availability zone so this insures we are protected from a server failure and a rack failure.

You have the option to create as many copies of the data as you deem necessary. Most clients pick two or three copies of the data.

High speed networking for low latency between nodes is highly recommended. I am not afraid to plug solutions I like. We are huge Arista Networking fans because of their low latency “Wall Street” switches with big buffers. These are the same switches designed for and used for programmed trading on Wall Street. Keep node-to-node communications fast is critical. Traditional legacy switching many times struggles and becomes the bottleneck with east-west traffic between server nodes.

Here is a video to get you up to speed. It's just a couple of minutes long and will make you an expert on Software Defined Networking.

Storage in the servers is recommended to be either SAS SSD or NVMe (non-volatile memory express). The edge goes to NVMe as the incremental pricing difference is worth the more than 2x performance of NVMe.


More About NoSQL

If you’d like to learn more about NoSQL and Marklogic I’d recommend you take a look at the Dan McCreary book “Making Sense of NoSQL: A guide for managers and the rest of us”.

If you have questions feel free to reach out to me. Want to learn more about Marklogic NoSQL? Download the Marklogic Server whitepaper by clicking here.



要查看或添加评论,请登录

Ken Proulx的更多文章

社区洞察

其他会员也浏览了