Effective Digital Transformation with the New "Kafcongo" Tech Stack: Kafka, Confluent & MongoDB
Britton LaRoche
Award Winning Enterprise Staff Solutions Engineer @ Confluent | MongoDB | Oracle
What is behind the rapidly growing excitement over this new tech stack that has taken the industry by storm? The answer is real world results, because it delivers digital transformation and legacy modernization in record time. The new tech stack is based on widely adopted, battle tested, cutting edge open source technology that lead to two new companies launching their public IPOs in the last 5 years. I am speaking of the enormous synergies between MongoDB (MDB ) and Confluent (CFLT ).?
Specifically I am speaking about Apache Kafka running in the Confluent Cloud and MongoDB Atlas. Confluent Cloud is a fully managed service for Apache Kafka and runs in AWS, GCP and Azure. MongoDB Atlas is the cloud native fully managed service for MongoDB also running in AWS, GCP and Azure. These two technologies working together have transformed numerous companies across many industries. In this article I will deep dive into a few publicly announced use cases that I personally worked with. I will take you through what I have personally learned and seen first hand since 2017, also backed with facts from other articles.
Digital transformation is the integration of digital technology into all areas of a business, fundamentally changing how you operate and deliver value to customers. It's also a cultural change that requires organizations to continually challenge the status quo, experiment, get comfortable with making mistakes in order to learn and adopt new ways of doing business. This leads naturally into a discussion of breaking down monolithic applications and data silos. It also leads into discussions of legacy modernization and in some cases cloud migration.
The value driver behind all of the discussions is data. What drives modern application development is data. Data that describes who bought what when, and where they bought it, or what they are looking for literally drives the new modern economy. One cannot speak about the importance of data with out considering how to access the data across the business, how move it about and where to store it. Storing data for analysis naturally leads to a discussion of database technology.
Why do we need a new Tech Stack for Digital Transformation?
What about something traditional like the Oracle Relational Database Management System (RDBMS) and its data integration tool GoldenGate? The Oracle relational database was and still is king of a vast empire of on premises relational database systems powering many legacy applications. Today it still holds 30% of the market share of all on premises database applications. ?
Times have changed, and technology has evolved since the first relational database, Oracle, was released by Larry Ellison in 1979. An ever growing number of people in technology believe that the time for developing new applications against relational databases and specifically Oracle RDBMS has come to an end. In large part this is true as the overwhelming velocity and volume of data required to operate in today’s real-time “instant gratification” tech demands are far beyond the capacity of relational databases.
Oracle much like the Mainframe will not die. It is so pervasive it will be around in one form or another for decades to come. It will just slowly fade into the background and lose relevance when developers select what database to use on new mission critical projects where either cloud or massive scale is important.?Relational databases will be relegated where they belong, in old on premises legacy applications.
For more details on why relational does not work at scale now or in the future check out my article on why I made the move from Oracle to MongoDB . It is a good read but I don’t want to clutter this article discussing the old tech stack.
It is important to see the big picture. Merv Adrian wrote an excellent Gartner blog article that shows Oracle missed the cloud and that reduced its total database market share from 36% in 2017 to 20% in 2021. Again this is the total database market (NoSQL, Columnar, Key Value store etc.. as well as RDBMS). Oracle lost 16 percentage points of total database market share over that 5 year period. I am assuming the steady trend will continue and at the end of 2022 they will drop around another 4% for a loss of 20% of total market share in 6 years.
Why did Oracle lose so much market share? It is partly because relational databases are not capable of scaling to the demands of the cloud. To quote a 2010 meme, relational systems are not "web scale." To make matters worse, Oracle missed the opportunity to properly embrace Oracle database software licensing in all of the major cloud providers. Instead they focused on moving customers to their own "Oracle Cloud." During that same time since 2017’s $38.6B year, the DBMS market added $40B — in 2021, doubling in 5 years.
The biggest database market story continues to be the enormous impact of revenue shifting to the cloud. In 2021, revenue for managed cloud services (dbPaaS) rose to $39.2B — at the end of 2021 it represented over 49% of all DBMS revenue. The growth has been stunning:
We can see that very close to half the databases are in the cloud as of 2021 and we know that cloud has doubled the size of the overall database market.?We will know soon enough the trend for the end of 2022. But estimates of the current cloud database growth rate have it growing from 39 billion to about 49 billion with another 25% growth. I think we will see that cloud databases at the end of 2022 represent around 62% of the total database market.?
Oracle kept the majority of its on premises market share but missed the phenomenal database growth offered by the three major cloud providers (AWS, Azure, GCP). Perhaps research will soon show its total database market share dropped another 4 points to about 16% at the end of 2022. Thats a stunning loss of over 50% of Oracle’s total database market share since 2017. This is the slow fade into the obscure on premise only, zombie like, undead oblivion I am speaking about. It's not that Oracle lost existing customers, they missed the lions share of the new cloud market.
End of life reached for Relational Databases
Relational going forward in this new digital world is... well pretty much dead. If you have any doubts as to the end of life of the Relational Database Management (Oracle, MySQL, SQL Server, Postgres etc...) then you should really listen to Rick Houlihan 's NoSQL Presentation at AWS re:invent 2022 . He was the world wide leader of NoSQL databases at AWS and is now the director of developer relations at strategic accounts for MongoDB. He knows what he is talking about, and explains where the AWS growth we see in the charts above came from. The selection of NoSQL over relational at AWS was in large part his decision. In short he sates that relational databases have reached the capacity of Moore's law on modern silicon chipsets and can no longer be computationally effective. The size of modern data sets combined with the number of computationally expensive joins required to retrieve the data make relational databases so slow they are now facing real obsolescence.
Rick backs every statement up with compelling evidence in his presentation. He was part of the group responsible for project "Rolling Stone" which migrated AWS core services off of all Oracle databases. Other companies are doing the same thing today with the Confluent Cloud and MongoDB Atlas.
The talk below is worth the time (if you have about an hour) after watching you will have a comprehensive understanding of why relational databases are at the end of life from a technical perspective and why the Oracle relational database is rapidly losing total database market share.
So what is the big deal with NoSQL? Why such growth in cloud? The answer is time to value and scalability. When it comes to scalability the growth of data now collected is astounding. According to IDC Global Data Sphere we are projected to grow to 175 zetabytes of data by 2025 we were just under 20 zetabytes in 2016. The scale is 1,000 times larger for each naming convention: kilobyte, megabyte, gigabyte, terabyte, petabyte, exabyte, zetabyte. A single zetabyte is therefore 1 billion terabytes.
Most of this new data growth is generated by web applications, mobile apps and IoT devices that have semistructured or unstructured data that do not fit in the rigid structure imposed by relational databases. Before a single line of data from all these new apps can be inserted into a relational database a developer or DBA has to design the schema. It's simply not possible to keep up with a relational database with both the volume of data, and the all the change with the new applications that have to be modeled. What is needed is a modern NoSQL database.
Data has a value life cycle. What is happening right now is tremendously valuable, what happened in the distant past, less so. Time to market with new applications is also a big factor.
There is a “gold rush” of sorts to move to the cloud especially when it comes to valuable data. This has pushed the need for a scalable cloud database like never before. Moreover it has pushed the need for a real-time connective tissue, an intelligent “central nervous system” of communication between systems like never before.
The complexity involved to configure a large database at scale and a Kafka cluster for real-time communication often takes months on premises. But in the cloud, even with fumbling about getting the right admins in the room for network permissions takes less than a day. Moreover the cloud offers database as a service where the infrastructure backups failover and patching is all done for you. The quick time to value, scalability and high availability for typically less than what you spend now on premises is what is behind the immense draw to the cloud.
Slow Batch Processing
Another key important factor on time to value is the ability to move from traditional slow batch processing to real-time data movement. The typical Extract Transform and Load (ETL) solution begins by extracting large datasets from one data source, transforming them into a new format and loading the large dataset into a destination source. A simple example is moving data from a transactional database that handles customer orders to a data warehouse that stores this data for many years for analytics.
This process is usually done at night to make sure the transactional database is not impacted during peak production work loads and typically takes hours. Typically the data is not available until the next business day.
The value of that data in real-time allows for predictive and actionable insights allowing the business to make impactful decisions as things occur. Imagine using Confluent Kafka to move that data from the operational data store to the data warehouse in real time. Confluent Kafka does this at scale. The distributed compute nature of the Kafka brokers enable massive scale and throughput all in real-time. For example, detecting fraud in real-time is far more valuable if you can stop the transaction as it is happening, rather than finding out the next day that your company lost money to a fraudulent purchase.
The key point here is that Kafka and the Confluent Cloud allows all this data to flow in real-time at scale from one system to another putting "data in motion" and allowing all systems access to make time critical decisions as they occur. I will show how moving from slow batch Extract Transform and Load (ETL) processes to a central nervous system of communication with real-time data coupled with a massive document database has transformed, and is continuing to transform companies in ways never before possible. Welcome the era of the new Kafka / MongoDB tech stack.?
I feel like the new tech stack needs a catchy, yet at the same time whimsical, name. Should we call it Kafkango? Hmm, not bad, but I feel it has the wrong inflection on the “ongo” in Mongo. Perhaps Kafkongo? Add a hashtag #kafkongo . The key here is the fact that Kafka has a lot of complex moving parts and we need a fully managed service like the Confluent Cloud. It should be Kafcongo spelled with a "C" for Confluent which slides nicely into the "ongo" of Mongo. The proper name and hashtag is therefore #kafcongo .
Say Hello to the New Tech Stack
The new tech stack consists of two modern cloud technologies: The Confluent Cloud and MongoDB Atlas. In the next few sections I will introduce the key concepts of each technology then show you how they work together. Finally I will give you real world examples of digital transformation using these technologies.
Apache Kafka and the Confluent Cloud
Data communication between systems has always been paramount since the advent of networking, and even more so with the public internet. Typically this has been done through point to point communication between systems for light weight small amounts of data and with batch ETL processes for heavy, large amounts of data. The idea is that one system communicates to another directly. That works until one system must communicate with several. Then you have a spaghetti mess and bottlenecks, and performance problems. Worse there is the fear of making a change to your data model because you have no idea what impact making a change to the data model in one system will have on another. For example there is a real world example of an automobile manufacturer that could not sell a car over $100,000 for over a 6 month period because they had to alter the size of the number in several interconnected systems which set the old limit to $99,999.
Point to point communication begins to look like the unwieldy mess pictured below as applications and databases become intertwined, mixed with data warehouses and SaaS applications that in turn feed other applications.
Wouldn't it be great to have a way for one system to publish data as a message or an event and have other systems subscribe to a topic? Message queues were invented for this purpose and they work relatively well. But they have one fatal drawback in that they typically do not persist the data. If you want to know the number of messages related to "customer orders" for example, you have to query a database for that... the message queue doesn't keep the data ... whoops we are are back into the point to point communication world again.
In 2010 LinkedIn developers like Jun Rao and Jay Kreps were trying to solve for this data mess, and the problem that the message queues of that time period did not persist the message data. They created the open source Apache Kafka project to put event data into topics and persist those events into logs. Because the event data is stored in the logs on the Apache Kafka brokers, any application that needs the current count of "customer orders" for the day gets it directly from the log for the topic. It's a one stop shop for the entire enterprise.
One look at the beautiful architecture digram above lends itself to an obvious question. Does Confluent Kafka become a bottleneck? No, Kafka is a highly distributed system with multiple nodes that share and balance workloads, it can easily scale to handle hundreds of gigabytes a second or more. If you need more bandwidth just add more brokers. And if Kafka is running in the Confluent Cloud it can scale up and handle bursts of traffic and then scale back down so you only pay for what you need.
Apache Kafka took off as one of the most successful open source products to date. It is in use by over 80% of the fortune 500 companies today. These companies went from the above spaghetti mess to a beautiful architecture above where Apache Kafka operates as a central nervous system across all of the applications and databases that need real-time access to that event data. Applications publish and subscribe to topics. Database connectors receive change data capture (CDC) events from a database and put it into a topic. That topic can be read from other applications or another database connector that moves data from one database to another in real-time. Gone are the days of slow batch ETL processes to move data nightly from one database to another. Today with Apache Kafka data can now flow freely across the enterprise in real-time.
The only downside is that Apache Kafka has a lot of complex moving parts. It consists of many highly available brokers, database connectors, a schema registry, perhaps some ksqlDB instances and more. It takes a highly skilled well paid staff to implement and maintain open source Apache Kafka. They are responsible for keeping it up and running and adding nodes to handle new requirements for new workloads at scale. It is as if they have to work on a car while it's running around a race track, to maintain open source Apache Kafka in production. The double edged sword of "real-time data" also means "no down time." Many mission critical applications run on Apache Kafka event data, and a maintenance window lasting a few hours can cost millions in lost revenue.
Jay Kreps recognized this as a business opportunity and with others from LinkedIn, who created Apache Kafka, started a new company called Confluent which had its initial public offering on the NASDAQ in June of 2021. The answer to the problem of implementing and maintaining a highly complex mission critical Kafka deployment is provided in large part with the fully managed "Kafka as a Service" offered in the Confluent Cloud. All of the patching and monitoring is done for you by Confluent. Additionally the Confluent Cloud automatically scales up and down to match your workloads so you only pay for what you need. The Confluent Cloud runs in all of the major cloud providers AWS, Azure and GCP in just about all of the regions around the world. With "Cluster Linking" the Confluent Cloud Apache Kafka clusters can replicate topic data across regions, even across cloud providers.
I joined Confluent when I recognized the fact that all of my enterprise customers had Apache Kafka and Confluent on their future state architecture. Until then I had never seen one technology adopted by all my customers in my entire career as a pre-sales engineer. Being in the industry gave me the insight that Apache Kafka and the Confluent Cloud is one of those once in a generation game changers that virtually every company needs.
It is not just the elimination of point to point integrations that lead to a spaghetti mess of integrations that need to be solved for. That spaghetti mess was at least real-time. The reason those point to point integrations came about was to eliminate the slow and complex batch Extract Transform and Load (ETL) processing. Slow batch ETL processes are the hallmark of legacy applications. The batch ETL process was developed to perform heavy lifting of large amounts of data that could take hours.
The solution for slow batch ETL processes above all else is solved for by Kafka in the Confluent Cloud. Not only does data flow in real-time but it flows at scale, extremely large volumes of data can flow in real-time at hundreds of gigabytes a second. Again the confluent cloud scales up automatically and scales down so you only pay for what you need when you need it.
Additionally, Confluent Cloud has a fully managed schema registry which can be used by all producers and consumers of data that communicate with the Kafka broker. Remember that data model schema change to all those systems that took 6 months to sell a car over $100k? Imagine how quickly that change could have been made if they used the Schema Registry in the Confluent Cloud. They would have had to make the change in only one place.
Take a moment (6 Minutes) to listen to Jay Kreps explain what Confluent is and why it is so popular.
MongoDB Atlas
MongoDB was purpose built to handle large workloads at scale. The name "Mongo" is short for "humongous." The creators of MongoDB wanted to address all the limitations that relational databases are suffering with, and they did so with relative ease. They decided to use a document structure to nest relationships with out having to do complex joins. They selected JavaScript Object Notation, more commonly known by the acronym JSON for the document model. JSON is a popular open data interchange format that is both human and machine-readable. Most developers are familiar with REST and GraphQL and are quite capable at creating and using JSON documents.
Relational databases don't handle JSON very well. You have a couple of options store JSON as a variable character length string or BLOB or break the object down into tables. It's a lot of work for developers and for the database. It doesn't scale. It's not just JSON documents relational database have trouble with. Every relationship in that JSON document has to be modeled in such away that it quickly becomes very complex. Pulling data out of a relational database requires joins across tables based on key values and is the real computational challenge. A normalized relational database cannot handle 100 billion rows for example.
MongoDB stores the JSON document as it is after converting it to a binary form called BSON, and creates indexes on fields in the document for extremely fast queries. Put a JSON document into MongoDB, query for a document using an indexed field and get the document back extremely fast. There is no computational overhead with this method. Take a moment (5 Minutes) to learn why MongoDB is so different by watching the video below.
This gives MongoDB the ability to really outperform a relational database on large datasets. Add in the ability to shard data across multiple servers and MongoDB really lives up to its name, "Humongous." MongoDB is a distributed database by nature, running across multiple servers to divide the workload.
MongoDB stores its relationships in the document. Its the natural and intuitive way developers think. It addresses all of the concerns that are causing relational databases to reach end of life when they are faced with modern velocity and data set size.
But that's just MongoDB. Let's talk about MongoDB Atlas. MongoDB Atlas is a fully managed cloud solution for MongoDB. It runs in all three major cloud providers AWS, Azure and GCP in just about every region in the world. It has a number of really great innovated features built in, like auto-scale, full text search, Atlas SQL, infinite storage, Charts & Graphs and more.
SQL Reporting with MongoDB Atlas is easy!
I want to clear a misconception about creating standard reports with data from MongoDB. The misconception is that NoSQL means you cannot do SQL reporting against MongoDB. With Atlas SQL you can use a JDBC driver which allows for full rich SQL reporting directly against a MongoDB Atlas instance.
MongoDB Atlas provides a fully managed solution that guarantees availability and scalability, back up and recovery with little to no effort from the devops team to keep it running.
Using The New "Kafcongo" Tech Stack
What makes this particular tech stack so powerful is the interplay between Kafka in Confluent Cloud and MongoDB Atlas. On the one hand we have a central nervous system with real-time communication through a connector architecture and on the other hand we have a humongous scalable database. Confluent cloud has over 70 fully managed connectors in the cloud and one of them is MongoDB. The MongoDB Source Connector reads data as it changes on a MongoDB collection and places the documents into the Kafka Broker. The MongoDB Sink Connector writes data from a Kafka topic to a MongoDB collection. The connector works both on open source and community versions of Kafka and MongoDB as well as the fully managed cloud versions of the software.
The power here is the enormous synergies between the two products. Confluent Kafka acts by putting data in motion and makes decisions, detects anomalies, and notifies other systems as the events occur in real-time. MongoDB acts as the perfect landing platform for data at rest to see what happened and study the whole history. Between the these two technologies you capture the entire data life cycle of an event. With the complete history we can understand the event and how it relates to other past generated events. Its worth highlighting as this is the real benefit.
Confluent Kafka puts data in motion across the enterprise and makes decisions as events occur. MongoDB is the perfect landing platform for data at rest to see what happened over time and study the whole history of generated events.
Between the these two technologies the entire data life cycle of an event is captured.
Now that we understand why the these two technologies working together are so powerful, lets dive into some examples of how this technology stack can be put into practice.
Legacy Modernization
Many times it is not possible for an organization to move off of legacy applications. These applications are entrenched in such away across organizations, 3rd parties, and business partners that it is not possible to rip and replace. What I am seeing at many customers is an offloading of work from legacy systems while at the same time modernizing the business offerings.
In the diagram above we can see that the Confluent Cloud Kafka brokers are receiving data from an IBM mainframe with regards to customer payments. The mainframe applications apply payments to the DB2 database. The IMB DB2i CDC agent reads the DB2 logs and produces these payment events to the Kafka brokers running in the confluent cloud. Once in Confluent Cloud two things happen in real-time. The customer payment is sent to Salesforce so the customer can see the payment has been applied to his or her account. It is important to note that Salesforce is updated through Confluent's fully managed connector that requires no coding to provide the update. At the same time the MongoDB payment collection is updated by the MongoDB Atlas sink connector.
Once in MongoDB Atlas the entire set of payments is available for analysis and reporting. Custom reports can be run against the lifespan of a loan. The customer base can be analyzed for a history of late payments by demographic information. This opens the door for a microservice to be available to third party systems to query the customer payment data to establish credit worthiness for a particular customer. This new cloud based customer payment score feature is now a new potential source of revenue for the company.
Although this is an example architecture it is based on several real world examples in use today at banks and financial institutions. The cost savings is realized by offloading the most of the reads off of the mainframe into the Confluent Cloud and MongoDB Atlas. This offload is significant, in my experience this is equivalent to reducing MIPS of around 60% to 70% as the majority of MIPS on the mainframe are typically related to application reads.
领英推荐
In this example we use Confluent Cloud to migrate data from DB2 to MongoDB. But there are many more examples. For more information please take a look at out partner page highlighting resources directly related to this article. Modernize Data Architectures with Apache Kafka? and MongoDB
In the long run offloading reads from the mainframe is just the first step to digital transformation. But, it is an enormous first step and enables multiple new use cases immediately. Let's take a look at some real world use cases that utilized these technologies to create new offerings and modernize legacy systems all in the name of digital transformation.
I worked for five years as a Sales Consultant (SC) at Oracle, a Solutions Architect (SA) at MongoDB for four years, and now for the past two years I work as a Solutions Engineer (SE) at Confluent. The following use cases are two customers that I personally worked with at both MongoDB and Confluent which have publicly referenceable implementations.
Most architectural discussions end with success. By that I mean, it's hard to argue with a real world working example. Until then you can solve just about any problem with any technology and one can easily get caught in endless discussions and white boarding sessions. White boards don't come with a compiler. In other words all architecture diagrams work on paper, but not all of them work in the real world.
With that in mind there are several companies that I personally worked with that used the new confluent cloud & MongoDB Atlas tech stack to achieve digital transformation either creating a new service or bringing new life to a legacy service. A short list of these companies include 7-Eleven, AT&T, Indeed, J.B. Hunt, Mr. Cooper, and Toyota Financial Services. I am providing a detailed recap of public information available on the web by J.B. Hunt and Toyota Financial services in the following sections. I am happy to give you more background and color if we happen to meet as part of my role as a Solutions Engineer at Confluent.
Real World Use?cases with the Kafcongo tech stack
J.B. Hunt — Trucks with Flux Capacitors
J.B. Hunt Transport Services, Inc. is a surface transportation, delivery, and logistics companies in North America. The Company, through its subsidiaries, provides transportation and delivery services to a range of customers and consumers throughout the continental United States, Canada, and Mexico. Today it generates over $3.7 billion dollars in annual revenue as an American transportation and logistics company based in Lowell, Arkansas. It was founded by Johnnie Bryan Hunt and Johnelle Hunt in Arkansas on August 10, 1961.
Below is a 30 minute video with Donovan Bergin (an Expert Software Engineer at J.B. Hunt Transport Services, Inc) the video is highly engaging and fun to watch. He presents a real world IoT use case enabling digital transformation at JB Hunt to track and monitor all the data gathered from the truck trailers and containers in real-time, as well as report against the entire trip with the Kafcongo tech stack.
I used my own artistic license to simplify and enhance his diagram (based on his use case) in my diagram bellow.
Telemetry data from over 100,000 containers (trucking, shipping and pods) all across North America is tracked every second. Everything from the GPS location to various readings regarding the shipment like temperature for cold chain management is collected every second from every container. Additionally the shipping container behind the truck goes from multiple drivers from the original pick up location to a warehouse to another driver for a cross country trip to another warehouse to another driver for the last mile. JB Hunt also works with BNSF railway for intermodal transportation meaning that the truck may deliver the container to travel a large portion of the trip via train on BNSF railways.
The transactional data ingestion alone was too high for relational systems to handle. With over 100,000 containers with multiple devices in the containers and on the trucks sending data in different formats every second, it required a system that could ingest several hundred thousand writes per second. They typical relational system running on a single server would quickly become a bottleneck with the ingestion alone. Never mind querying the data. J.B. Hunt began a search for a system or set of systems that could be used for real-time alerting and for long term time series reporting against the IoT telemetry data collected from its drivers with mobile apps, it's trucks and its containers.
After considerable research and testing J.B. Hunt selected Apache Kafka running in the Confluent Cloud to handle the ingestion and real-time notifications, and MongoDB Atlas for the time series reporting capabilities. Both of these technologies are fully managed cloud based offerings capable of handling massive scale.
Donovan used a clever example of shipping IPA beer across the country where the beer had to be kept below a certain temperature in order to preserve its flavor. Two key requirements are in play. One: keep the shipment below a certain temperature and notify the driver in real-time if there is a problem, and two: provide away to report against every step of the journey to make sure that at no point the temperature in the container exceeded a threshold.
At any point in time JB hunt can be alerted if the temperature is rising quickly to notify the driver to check the container. A simple example might be someone left the door open along a stop. Or perhaps a fuse was blown on the containers cooling system. Confluent Cloud and tools like ksqlDB allow for an easy way to set thresholds to put telemetry events into special alert topics read by an application than can notify the driver. This is an example of how real time alert notifications could allow the driver to take corrective action. Additionally all the metrics can be used for real time notifications, for example if the driver is headed to the wrong location or traveling too fast or too slow.
But the second requirement of keeping track of every step of the journey in a time series database is a real problem. Donovan called up three volunteers during the presentation. Travis was the truck driver that picked up from the origin location to deliver it to the train. Philip was the train driver. Monan was the truck driver that delivered the beer to its final destination. Donavon said its pretty easy to get IoT data from the each of these conductors and drivers and their equipment because they are doing work for him and he can easily see the real time data.
Donovan shifted gears, "now you are the customer and you want to know what temperature your beer was at during each leg of the journey. You have access to a rest API so you can query the data. Do you know the names of the drivers? Do you know the container id? Do you know any of the ids for any of the devices? Do you know the time frame to query for each leg of the journey?" As a customer you don't have access to this information you don't want to query the device ids then go to a time series database and query that id multiple times for each leg of the journey.
MongoDB gives a lot of flexibility in the design because of its secondary indexes in the time series collections. You don't need to know all this information to piece everything together because you can search on what you do have using secondary indexes and MongoDB will provide all the other information. He also said it is quite useful for the equipment management side of the house because it's easy to check and see if there are issues or the shipments are being delivered on time.
A final note on the importance of cross cloud compatibility. All of the business capability JB Hunt developed is delivered through the fully managed Kafka service running in the Confluent Cloud and the fully managed MongoDB Atlas instance in GCP. Originally JB Hunt had started this project in Azure. Because both the MongoDB Atlas and Confluent cloud run on all the major cloud providers (AWS, Azure and GCP), JB Hunt was able to avoid vendor lock in with Microsoft. They were able to migrate all their systems and go live on GCP in a matter of six weeks.
Had they developed on something like Microsoft Azure Event hubs and Cosmos DB, they would not have been able to move to another cloud provider with out completely rewriting their applications. This is another important consideration when developing application software, try to avoid the "easy" path of the specific cloud vendor's offerings. GCP doesn't support Event Hubs, and Azure doesn't support Google data flow. GCP doesn't support Cosmos DB either.
Toyota Financial Services
Toyota Financial Services (TFS) operates in over 40 countries as a wholly-owned subsidiary of Toyota Motor Corporation. It provides auto loans, leases, and insurance to prospective Toyota owners, and helps dealers finance expansion. Its annual income is over $5 billion USD with over $250 billion USD in managed assets globally.
In February of 2019 Toyota Financial Services was brought onstage in the posh Moscone South San Francisco Ballroom alongside IBM for a presentation on an enormous accomplishment for TFS legacy modernization. TFS was able to offload a significant percentage of reads from the mainframe and Oracle by leveraging Confluent and MongoDB. The solution also provided information to SaaS customer solutions to apply payments in a real-time fashion.
Prior to this solution TFS had spent considerable time money and effort to reduce mainframe MIPS with traditional technology stacks and had poor results. One of the many problems they were facing was that the batch cycle which extracted payment information from the IBM Z Series mainframe to be processed in Oracle was taking longer than 24 hours to complete and post payments to Salesforce. Some customers who paid their vehicle financing on time, but within a day or so before the due date were receiving late payment notifications due to the slow batch process.
Ken Schuelke the National Manager of the Enterprise API factory selected this architecture because it was different from many other submissions based on legacy technologies. This new tech stack allowed for rapid prototyping with modern technologies like Confluent Platform (Kafka) and NoSQL with MongoDB. They named this new approach using microservices with Confluent and MongoDB the "Enterprise Integration Platform" (EIP). We call it Kafcongo.
The presentation (linked above) was published publicly by IMB's Executive Architect Slobodan Sipcic, Ph.D. on slideshare.net in 2019. The quotes below echo everything we have previously discussed in this article.
Below are some quotes and diagrams from the article. Its starts with a summary of the problems before the EIP project. You may recognize some of these issues in your current environment or in the environments of your business partners and customers.
“Toyota Financial Services has a multitude of applications which were implemented over the years by various functional domain groups to meet “siloed” needs, ranging from lightly configured to heavily customized vendor packaged solutions and from hybrid to completely purpose-built in-house solutions.”
“Summary Challenges: Business processes are overburdened with manual, redundant data entry. Batch processing windows are long and often delay the business. Revitalization and uplift projects suffer from delays and very high cost. Data quality and consistency issues persist between applications. Maintenance of heterogeneous and aging systems is costing a lot of money. Staff intellectual development and innovation is inhibited by focus on aging technology.”
“The TFS & IBM innovative solution for EIP is based on ... several open-source technologies deployed on top of it. The solution is cloud agnostic and utilizes microservice architectural style.”
“The delivery tier is responsible for optimizing delivery of events and data between the platform components. NiFi and Kafka constitute a scalable event-driven delivery backbone of the EIP architecture..”
“EIP utilizes NoSQL document database - MongoDB - for persisting Harmonized and Materialized contract related information for several reasons including: Storage architecture is aligned with the structure of contract information, Simplicity of retrieving all attributes for a specific account - no need to join multiple tables in relational database. Ability to handle high volumes of data, Efficient, scale-out architecture instead of monolithic architecture.”
The new EIP platform successfully offloaded large amounts of reads from the mainframe and helped an outdated batch process running on an old legacy Oracle system reach end of life and be retired. The project started out with the open source community edition of MongoDB but has since migrated to MongoDB Atlas.
According to the presentation the challenges listed above were met and over come. New business processes removed manual, redundant data entry. Batch processing windows were no longer used. Revitalization and uplift projects no longer suffer from delays and very high cost. Data quality and consistency issues were removed between applications. Maintenance of heterogeneous and aging systems was no longer necessary. Finally, Staff intellectual development and innovation is now liberated by focus on modern technology.
Enabling digital transformation
The transportation sector is changing. Concerns around congestion, consumption, and carbon emissions are forcing consumers to rethink their transport choices. We’re looking to a future that is likely to involve accessible public transport, ridesharing, more shared ownership, and greater charges for larger emissions. Consequently, finance processes within the industry will need to be faster, more mobile, and more personal.
Toyota Financial Services (TFS) wants to be at the forefront of this emerging reality. It recognizes that digital transformation is central to its continued success.
“We’re a big, successful company with the means to invest in digital transformation in a way the smaller manufacturers can’t,” explains Ken Schuelke, Division Information Officer, for Enterprise API Services at Toyota Financial Services.
As part of this transformation, TFS is transforming itself from a company that buys software off-the-shelf to one that develops its own. This has required the hiring of in-house developer talent, and the provisioning of the tools for those developers to work effectively. One of the immediate outcomes is the creation of the TFS mFin multi-tenant platform. This is the cornerstone of TFS’s toolkit to handle the company’s digital complexity, managing sensitive data from multiple partners and ensuring data is kept separate.
Additionally, Schuelke notes the company’s commitment to community: “Toyota really wants to make the world a better place; it’s very customer focused. Where we want to go now is to expand mobility for all, so we’re developing mobility services and expanding our company from just a finance lending platform to a full mobility stack.”
As the business grows and new private-label projects require stronger segmentation, Schuelke says the challenge is to maintain developer agility with a goal to create a dynamic developer culture.
“Previously, we had a database team, an integration team, a front-end development team and an infrastructure team focused on these monolithic deployments,” he explains. “Now, with the speed at which the market is moving, we want smaller teams with all the platform tools they need to do the whole job. We want them to be innovative, to solve problems on their own and to write great code. So, the developer experience that MongoDB is shooting for is the same thing we’re shooting for.”
In short the solution went from an expensive old cumbersome slow batch process to a real-time process where customer payments were posted to Salesforce as soon as they were processed on the mainframe. They went from the decades old tech stack of Mainframe and Oracle to the new modern cloud based Kafcongo tech stack of Kafka, Confluent and MongoDB. They are reaping the rewards of their digital transformation. TFS is yet another successful example of the use of the Kafcongo tech stack in the real world.
Migrating from Oracle to?MongoDB with Confluent
This section should be called “legacy migration: using the new Kafcongo tech stack to get off of the old tech stack” but its simply too long for a good title. While at the MongoDB .local event in Dallas Texas on October 27th 2022, I ran into a number of joint customers between MongoDB and Confluent. Finding companies that use MongoDB and Confluent together is far more common than you might think. It is no mistake that Confluent won the technology partner of the year award for 2022 from MongoDB. The reason is the enormous synergies between the products and the fact that the two new startup companies have the same customer focus and work well together beyond just the technologies. The product management and sales teams have formed a great partnership.
Below is a youtube channel dedicated to the MongoDB .local event in Dallas in 2022. It is interesting to note how many of the presentations refer to event driven architectures. MongoDB is a natural fit with the event driven architecture provided by Confluent Kafka.
Many of these customers I met with at the MongoDB .local event were very eager to modernize off of legacy systems. Many of them had done the exact same thing as a proof of concept. They had used open source Apache Kafka and MongoDB's community edition to test if it was possible to migrate data off of Oracle through Kafka and into MongoDB. I spoke to 5 different companies outside of JB Hunt that have completed this same POC successfully.
It was very easy. The proof of concepts worked very well and used some of the components in the basic architecture I presented before. I'll paste it here again for convenience.
The basic strategy is to create a materialized view in Oracle that joins several key tables together. The materialized view is set to auto refresh. When any changes are made to the base tables the materialized view is also updated. Next we deploy the Kafka JDBC connector to read from the materialized view in Oracle and publish that data in JSON format to a topic.
In our example below we use the "Customer Orders" topic. The JDBC connector detects changes in the Oracle view and it has the ability to automatically convert each event into a JSON document. It does this relational to JSON translation automatically and develops a schema with all the data types preserved. This schema is accessible through Confluent's fully managed Schema Registry service and allows any consumer of the topic to read obtain the schema and read the real-time event data in JSON format.
Just getting real-time Oracle data into Kafka is powerful enough on its own. Developers can start creating applications immediately against real-time data in the Customer's orders topic. It is even more powerful if the POC utilized the Confluent Cloud rather than opensource Kafka. Utilizing the Confluent Cloud allows for a fully managed JDBC connector and allows other cloud native applications to take advantage of the data in the topic. But wouldn't it also be great to put this data into MongoDB for other applications, reports and analytics?
The final step is to deploy the MongoDB connector to read data directly from the customer orders topic where Oracle produces its CDC data. Once the connector consumes data from the topic it writes that data to MongoDB. The result was very straight forward, real-time change data capture is now flowing from Oracle straight into MongoDB.
The great thing about this solution is it requires no code. For everything outside of the Oracle view, it's just setting up connectors and providing connectivity information. No agents or listeners or anything is deployed on Oracle. If one utilizes a fully managed connector in the cloud setting up the connector is accomplished in a matter of minutes. All of this data flows in real-time, no slow batch ETL process. No separate set of ETL code to maintain.
Everything you need to do to run your own free Kafcongo POC is here in this github: https://github.com/brittonlaroche/Oracle-Confluent-MongoDB/
As I spoke to the developers and architects that made the POC happen, I asked about the number of people on the development team that participated in the POC. In most cases it was just one or two developers that made it all happen. It generated quite a buzz at the company and already new projects were underway to take advantage of this legacy migration.
My only caution to them was, "You do realize that this will be probably be successful and wind up in production right?" The developers I spoke to agreed.
I said "And when it breaks because its running on open source products that you installed yourself on premises who do you think they will call at 3:00 AM in the morning because it is down?" They rolled their eyes in acknowledgement.
"Wouldn't it be better to go live on fully managed services like MongoDB Atlas and the Confluent Cloud? If it goes down, we get the call, not you." They agreed again. Fully managed services is an easy sell to the overworked devops team.
I see this as a trend that is taking off across the customer base, starting with open source versions then migrating to fully managed cloud services. This same POC Oracle -> Confluent -> MongoDB ran independently across 5 different customers. They all came up with the idea on their own. This makes me think it will one day be mainstream. I hope this article spreads the idea even further.
At the end of the day kafcongo is new movement that is picking up steam. Similar to a conga line, perhaps we have a new "kafcongo line" forming. Its picking up steam and more companies are naturally joining in. It reminds me of the song "Meglio Stasera" know in english as "It Had Better Be Tonight" where love is only so patient.
Sooner or later you have to make your move. The point of the song is don't wait too long, love isn't something to hesitate on. Perhaps now is the time to take a look at kafcongo. In other words "one can stand still in a flowing stream, but not in a world of technology."
And finally I have to admit I had a lot of fun generating the art for the article with artificial intelligence. I was able to provide reference art and prompts that generated images for the dream team as robots. All of this is available for free at dream.ai
After I generated the art work I used gimp to apply logos and rotate the MongoDB connector to be more of a landscape than a portrait for the Kafcongo mascot.
Honestly I think Artificial Intelligence does a better job than I ever could in just a few seconds.
Guiding Companies Through Digital Transformation | Passionate about Behavioral Economics
1 年Ransom Briggs I thought you may find this article of interest
Guiding Companies Through Digital Transformation | Passionate about Behavioral Economics
1 年Kevin Neville Venkatesh Brahmadesam - thought this article would be of interest for you.
Servant Leadership in Business and Technology | Enterprise Architecture | Entrepreneur
1 年Great write up Britton - very informative. I searched your article for a treatment on data security as it relates to all of the points of the article, which is just as important as the data itself. How we weave data security into the technology, data in flight, data at rest, security around source systems and consumers, and protection from unauthorized access (among other "topics" :) ) is extremely important. That can change how we architect and utilize data in many ways. The technology is great, but we've also reached a tipping point in that security has to be a part of all of our thinking from the beginning to the end of a system, data included.
MongoDB | Regional Director, Enterprise Sales
1 年bhabesh acharya
Senior DevRel Program Manager @ MongoDB | Data, Developers and Community Building
1 年This is a great read!