Re-imagining Apache Kafka for Reinforcement Learning with Consumer Groups using VIPER
Sebastian Maurice, Ph.D.
Global AI and Machine Learning Leader | Teacher | Inventor | Author | Blogger | Coder
My last blog I wrote about how Apache Kafka and MAADS work together to manage, publish and consume outcomes from distributed algorithms. VIPER is a product, in the MAADS product family, that performs these functions seamlessly with Kafka on a very large scale. So far, I have been very impressed by Kafka, especially in its ability to manage large streams of data without the need for large outlay of technology and infrastructure that normally come with lots of data, and lots of algorithms.
In this blog I want to show and discuss how we can re-imagine Kafka with VIPER for reinforcement learning using consumer groups. If you Google reinforcement learning you will likely get the following definition:
"Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning." (Wikipedia)
Lots of important terms in the above statement but the key ones are "software agents" and "maximize the notion of cumulative reward." What is a software agent? Using the definition given by [Wooldridge and Jennings 1995]:
Agent: A computer system that is situated in some environment, and that is capable of autonomous action in this environment in order to meet its design objectives .
Ok, that all sounds theoretical but what does this all mean in a business context? Well, the above has a lot of value for the business world in 3 key areas:
1) Business planning: how are you going to meet the changing needs of your customers?
2) Business decisions' optimization: how are you going to use data, algorithms, and insights for better decision making across all business lines?
3) Business risk mitigation: How are you going to reduce business risk if something terrible happens? (Like Covid-19).
The above still sounds theoretical. Lets bring all the pieces together in a simple example. Lets assume you are a large retailer that have lots of suppliers, lots of customers, and lots of competitors. Now, in a computer context: we have 3 software agents.
Agent 1: Supplier Group: Their goal is to sell their products to the retailer for the highest price.
Agent 2: Customer Group: Their goal is to buy the products from the retailer for the lowest price.
Agent 3: Competitor (or Market) Group: Their goal is to undercut the retailer and take market share.
How is the retailer going to balance all these goals and still be profitable and grow their business? How do they do this in a constantly changing environment? One way we can think about this in practical terms is with data and reinforcement learning.
How is the retailer going to balance all these goals and still be profitable and grow their business? How do they do this in a constantly changing environment? One way we can think about this in practical terms is with data and reinforcement learning (algorithms).
Within Kafka there is a concept of consumer group that I will use to show how our retailer can choose the BEST price for the ONE product, lets say the BEST price for a MARS Chocolate bar (I love MARS bars - well used to), while:
1) Giving the supplier the highest price for the MARS bar
2) Giving the customer lowest price for the MARS bar
3) Ensuring the product price is the most competitive price to maintain market share-i.e. the retailer does not want to price the MARS bar at $5, and across the street it is selling for 50 cents.
IMPORTANT: This example is for ONE product (MARS bar), which means the retailer has an option of 3 three prices for the MARS bar that tries to meet the GOALS of the supplier, customer and the market BUT, obviously, he can only choose ONE price for the MARS bar product. How he can do this is illustrated in the image below.
The above image shows:
1) MAADS training, prediction and optimization platform that contains trained algorithms for the MARS bar using the MAADS python library OPTIMIZE function using special constraints in the objective function for the supplier, customer, and market.
2) MAADS VIPER retrieves the optimal prices from each algorithm for our supplier, customer and market.
3) VIPER then publishes these optimal values to the Kafka topic "Optimize Product Price" in the Kafka Cluster.
4) Kafka automatically writes the optimal values (retrieved by VIPER) to each of the 3 Kafka partitions
5) We have set up a Kafka consumer group made up of our 3 agents. Note in Kafka, each agent can only access 1 partition, that is why we have 3 partitions for 3 consumers.
6) VIPER on the other end, consumes the prices for the consumer, supplier and market and sends to human to decide the ONE price for the MARS bar.
WHEW! Pretty cool isn't it. In case I lost some readers here is what happened:
a) In the first step - we must create the algorithms using data for MARS bars with MAADS optimization function: THREE algorithms for supplier, customer, market
b) Once the algorithms for MARS bar pricing are created for each, they are available to VIPER
c) VIPER reads the optimal values and publishes the optimal values to Kafka topic "Optimize Product Price"
d) Kafka writes the pricing to each of the 3 partitions that our consumers are subscribed to
e) VIPER reads these prices and gives to human to choose BEST price for MARS bar
So, where is the reinforcement learning happening you ask? This is a great question. Recall above the GOAL for the three agents:
1) Supplier wants highest price for MARS bar
2) Customer wants lowest price for MARS bar
3) Market wants a break-even price for MARS bar
The reinforcement learning is happening in the MAADS optimization function using the MAADS python library by minimizing prices as our objective for the consumer, maximizing prices as our objective for the supplier, and maximizing profits for the market so not to lose market share. The retailer now has three options for prices - but he needs to choose one price for the MARS bar.
The reinforcement learning is happening in the MAADS optimization function using the MAADS python library by minimizing prices as our objective for the consumer, maximizing prices as our objective for the supplier, and maximizing profits for the market so not to lose market share. The retailer now has three options for prices - but he needs to choose one price for the MARS bar
So, what is the optimal price for the MARS bar? Well, this is where the retailer needs a pricing strategy to keep everyone happy. Maybe he chooses an average of the 3 prices, or he choose one price that best fits his corporate objectives, or he choose a price to keep his suppliers happy, or... you get the point.
To complicate matters even more. How do you do the above for not just a MARS bar, but 10 million other products that he sells, with 10 million individual algorithms, and oh, how do you do generate the above optimal prices for the 10 million products every day, or every hour?
To complicate matters even more. How do you do the above for not just a MARS bar, but 10 million other products that he sells, with 10 million individual algorithms, and oh, how do you do generate the above optimal prices for the 10 million products every day, or every hour?
Now things get interesting and its exactly the space where VIPER and KAFKA thrive!
Till next time...