Gaming Platform Unlocked
Building a platform from the ground up is always a challenge. It does not matter if you are an entrepreneur, professional or amateur engineer, you always need to get your hands dirty to test your hypothesis. Today, if you are planning to create a gaming platform you will need to consider a couple of fundamental requirements. But before jumping into the technical details of the requirements you always need to consider what your users want. Eventually, the platform users are the real testers of the systems and you should start to understand their requirements first. Even though there are extrinsically or intrinsically motivated players in the world of games, the main purpose of the gaming platform is to earn the trust and love of all the gamers for continuity.?
In the current digital era, the gaming platform should not have any delays, and should not miss any single message or transactions. These are the table stakes for the users and not delivering these expectations will be the end of the gaming platform. The traditional systems which are composed of point to point integration architecture depending on the queue and batching mechanism will eventually fail to deliver these user expectations. Therefore, the data exchange between the users and platform should be in real-time and reliable. Afterwards, you can look into the challenges waiting ahead of you.
As the first challenge, collecting consistent events from the games will be essential and it should be pluggable for any type of game such as casino, online, board games etc. Fundamentally, whenever users play the games they create events and these events are valuable feedback for the game developers to improve their games. If you have already decided to create your platform you should make sure that you can effectively collect all the events and preserve the consistency of the data till it reaches the destination. Today, the destination will be a Transactional/Analytics/Nosql database but tomorrow you will include any other services such as wallets/payments/fraud/indexing/logging etc.?
Secondly, you need to consider the monetization of the games. When the user has any in-game purchase you should make sure that transactions are not duplicated or not lost during the data transfer. This will be crucial for the reliability of the platform.
Lastly, if you are hosting the game servers you need to make sure that every server is operational. There should not be any bottleneck for scalability and elasticity based on the fluctuating demands. In order to achieve operational excellence you need to collect all the logs/metrics and create observability of the gaming platform. Hence, if there is any spike of the user activities the monitoring system will make informed decisions with real-time data on behalf of you.
In light of the demanding infrastructure requirements above, Apache Kafka which is the de facto standard for real-time data streaming will be the handy tool to create a reliable gaming platform. Moreover, Confluent Cloud which is built on top of Apache Kafka offers much more out of the box features to enhance the developer productivity and reduce the infrastructure cost. As a result, in the center of the platform architecture, Confluent Cloud will orchestrate demanding data pipeline requirements and be the bridge between the systems. In addition to this, Confluent Cloud offering which provides tooling and advanced features for production workloads will fulfill stream processing (ksqlDB) and data governance requirements. Let’s look at the 3 necessary pillars in the next section.
(1) - Orchestrating events as the starting point
As a first requirement, you need to collect events from users and every action that user takes in the game will have different data types. In order to unlock all the player patterns, storing multiple event types in one topic is crucial. Also, in order to reduce the transfer cost, integrating the producers and consumers with a schema registry will be the first item in your list. The platform needs to have the starting architecture in Figure 1.?
In this method, you should make sure that your clients have this settings:
Depending on the destination applications or consumers, some of the data processing can be slower than the rate of producers. In this situation, parallelizing the consumers will give a boost to your consumers with less resource requirements which will also reduce the cost of the platform. In order to design the parallel consumers, you do not need to be an expert on distributed systems and parallel processing. There are useful frameworks that will help you to deliver requirements in a short time. You can start to explore this idea by checking this blog “Introducing the Confluent Parallel Consumer”. Next, you can explore your favorite programming language and check the parallel processing libraries.
While creating the prototype is easy, you need to think of a minimum of 99 different ways to fail the production system and prevent it before it happens. In order to handle the errors, dead letter queue (DLQ) is a common pattern but challenging the system design with retries will reduce your headache and requirements of DLQ monitoring. If you do not have any experience or you want to refresh your knowledge you should check the blog “Error Handling Patterns for Apache Kafka Applications”.?
After all the readings and hard work your platform will be ready for the test drive but do you think it is ready for users??
(2) - Preventing duplicate message with ksqlDB
Even though you tune your producers based on the best practices, duplication of the messages can occur with any API communication. In order to process duplicate messages and discard them quickly you can utilize the ksqlDB with a twist. At the time of the writing, ksqlDB does not support Time To Live (TTL) in tables but you can achieve a global table with TTL by handling the cleaning of the keys based on your business requirements using a microservices as diagram below.
Figure 2 shows the mechanism of deduplicating the messages with TTL but peeling every layer for a better understanding is required.
Transaction data can have lots of fields to fulfill the requirements of the multiple systems but for simplicity the below Json payload will be used.
key:
{
id_key:'txn1'
}
value:
{
id:'txn1',
user_id: 'user1',
txn_type: 'purchase',
txn_timestamp: '2023-04-10 08:00:00',
total_sales: 10,
currency: 'USD'
items: ['book1', 'book2']
...
}
First of all, the transaction topic will be read into the ksqlDB Stream functions and change the message timestamp with txn_timestamp as below.
CREATE STREAM TXN_RAW (
ID STRING,
USER_ID STRING,
TXN_TYPE STRING,
TXN_TIMESTAMP STRING)
WITH (KAFKA_TOPIC='txn_raw',
TIMESTAMP='txn_timestamp',
TIMESTAMP_FORMAT='yyyy-MM-dd HH:mm:ss',
VALUE_FORMAT='JSON',
PARTITIONS=3);
Changing timestamps is required to achieve correct aggregation and to prevent any issues with the late arrival of the message. Another problem will be the keyless messages which are distributed in the topic as round robin. In order to have parallel processing to achieve higher throughputs, the messages should have keys and be distributed to the partitions accordingly. Luckily, ksqlDB has built-in repartitioning functionality that will solve this problem. The script below will repartition the stream and distribute the data to partitions accordingly.
CREATE STREAM TXN_PARTITIONEDBY_TXNID
WITH (KAFKA_TOPIC='txn_raw_key_txn_id',
VALUE_FORMAT='AVRO',
KEY_FORMAT='KAFKA') AS
SELECT
ID AS ID_KEY,
AS_VALUE(ID) AS ID,
USER_ID,
TXN_TYPE
FROM TXN_RAW
PARTITION BY ID;
ksqlDB has multiple optimization options to increase the throughput and based on the requirements of the use case, you can disable the buffering to get the results faster. Applying the query based option of 'cache.max.bytes.buffering' = '0' will reduce the waiting time to get unique values of the script below.
领英推荐
Add specific query property
SET 'cache.max.bytes.buffering' = '0';
CREATE TABLE TXN_WINDOW_10MIN_UNIQUE_TABLE
WITH (KAFKA_TOPIC='txn_window_10min_unique',
VALUE_FORMAT='AVRO',
KEY_FORMAT='KAFKA')
AS
SELECT
ID_KEY,
EARLIEST_BY_OFFSET(ID) AS ID,
EARLIEST_BY_OFFSET(USER_ID) AS USER_ID,
EARLIEST_BY_OFFSET(TXN_TYPE) AS TXN_TYPE,
COUNT(*) AS MESSAGE_NO
FROM TXN_PARTITIONEDBY_TXNID
WINDOW TUMBLING (SIZE 10 MINUTES, GRACE PERIOD 2 MINUTES)
GROUP BY ID_KEY
HAVING COUNT(*) = 1;
Above ksqlDB script will capture the unique messages for every 10 mins window with 2 mins Grace Period in case the Global Table in the following steps are delayed to get the unique message.
In order to process all the unique messages in the topic, a new stream will be created and ready to integrate with unique transactions and a global lookup table for transaction ids.
CREATE STREAM TXN_WINDOW_10MIN_UNIQUE_STREAM (
ID_KEY STRING KEY,
ID STRING,
USER_ID STRING,
TXN_TYPE STRING,
MESSAGE_NO BIGINT)
WITH (KAFKA_TOPIC='txn_window_10min_unique',
VALUE_FORMAT='AVRO',
KEY_FORMAT='KAFKA');
Unique transactions need to be used to create the global table as lookup. Hence, creating a stream for unique transactions is executed below.
CREATE STREAM TXN_UNIQUE (
ID_KEY STRING KEY,
ID STRING,
USER_ID STRING,
TXN_TYPE STRING,
MESSAGE_NO BIGINT)
WITH (KAFKA_TOPIC='txn_unique',
VALUE_FORMAT='AVRO',
KEY_FORMAT='KAFKA',
PARTITIONS=3);
Transaction lookup table is created below for each transaction ids and also timestamp of the message that produced first time is added to the table in order to execute the cleaning process based on the specific time requirements or business use case.
CREATE TABLE TXN_LOOKUP_TABLE
WITH (KAFKA_TOPIC='txn_lookup_table',
VALUE_FORMAT='AVRO',
KEY_FORMAT='KAFKA')
AS
SELECT ID_KEY,
EARLIEST_BY_OFFSET(ID) AS ID,
TIMESTAMPTOSTRING(EARLIEST_BY_OFFSET(ROWTIME), 'yyyy-MM-dd HH:mm:ss.SSS') AS MSG_ROWTIME,
EARLIEST_BY_OFFSET(ROWTIME) AS MSG_EPOCH
FROM TXN_UNIQUE
GROUP BY ID_KEY;
Finally, any duplicated messages that arrived after a 10 minutes window should be checked against the global lookup table whether it was processed before or not to allow only unique messages to flow to the unique transactions topic. This logic will be operated simply by inserting all the new transactions to the unique transactions stream after joining with the global table and validating that the transaction id is not present in the lookup table as below.
INSERT INTO TXN_UNIQUE
SELECT
T10.ID_KEY AS ID_KEY,
T10.ID AS ID,
T10.USER_ID AS USER_ID,
T10.TXN_TYPE AS TXN_TYPE,
T10.MESSAGE_NO AS MESSAGE_NO
FROM TXN_WINDOW_10MIN_UNIQUE_STREAM T10
LEFT JOIN TXN_LOOKUP_TABLE TLT ON T10.ID_KEY = TLT.ID_KEY
WHERE T10.ID_KEY IS NOT NULL
AND TLT.ID_KEY IS NULL
EMIT CHANGES;
As all other systems, ksqlDB has limited memory and cleaning the global lookup table is crucial for the continuity of the processes. In order to achieve this, deleting the keys from the global table is required and using a Tombstone message for each key will do the trick. If you are planning to test this in ksqlDB, you can create a stream for lookup table cleaning and insert a ‘null’ message value with transaction id as message key to the stream below.?
CREATE STREAM TXN_LOOKUP_TABLE_CLEANING_STREAM (
ID_KEY STRING KEY,
DUMMY STRING)
WITH (KAFKA_TOPIC='txn_lookup_table',
VALUE_FORMAT='AVRO',
KEY_FORMAT='KAFKA');
Even though it looks like a quick fix to generate TTL in ksqlDB, you need to be careful while sending Tombstones because some of the Apache Kafka Clients have different default hashing strategies. Using a disparate hashing method will generate a different hash key which will cause the keys ending up in different partitions. It will defeat the purpose of this cleaning step and make sure that clients use the same hashing strategy as the ksqlDB.
If you complete all the steps, you will end up having the full lineage of your production as below (Figure 3).?
Confluent Cloud has built-in stream governance which will help you to explore the current production status. In addition to that, with the stream catalog you can increase collaboration and productivity with self-service discovery of the streams. While all the users are empowered with data, schema registry will help to check the quality of the streams and maintain the data integrity as the services evolve.
(3) - Observability of the gaming platform
Creating observability for mission critical applications can be challenging but with a decent strategy you will be able to collect important metrics and errors from your servers. As an example, you can check the blog “Scaling Apache Druid for Real-Time Cloud Analytics at Confluent” to learn how Confluent created the Metrics API for their customers.
In real world applications, services run on bare metal, VMs or kubernetes. Collecting logs and metrics from different sources can be overwhelming and using an agent or aggregator often solves these requirements. Fluentbit, Fluentd, Logstash, Filebeat etc. are some solutions which have Apache Kafka Output plugin that will send all the logs/metrics to Confluent seamlessly. In this gaming platform example, Fluentbit which is a lightweight, fast, and scalable logging and metrics processor and forwarder will be utilized. Additionally, Fluentbit comes with the librdkafka client which has multiple performance configurations and is maintained by Confluent Engineers.
After harvesting valuable logs and metrics, ksqlDB will be handy to filter, process, aggregate, and produce to respective topics. In order to achieve successful observability, the game developers/operators need to be notified or informed with valuable insights. In modern architectures, there are multiple destinations to integrate where the power of Confluent will surface without any doubt. With multiple native integrations and connectors to the destination systems, Confluent will be able to reduce time to production of your observability system (Figure 4).
Observability is the backbone of all the services and without any real-time data infrastructure the preventive maintenance will be not possible. Any unexpected failure can cause downtime, loss of revenue, and loss of trust. In order to achieve high standards in your gaming platform, designing reliable observability systems is crucial for business continuity. Today’s architecture will evolve fast with new technologies and today’s architecture decisions will either unlock the potential of your business or will cause disruption for your business. New ideas are shaping at the moment, be prepared for the future with Today’s decisions.
What is next?
If you manage to reach this point you may be delighted or frightened about the gaming platforms. Maybe you are ready to take the challenge one step further and ready to publish your results, or you will use your learning and apply it to your data streaming challenges. It is always good to cross pollinate the ideas from different industries to achieve better results.
Getting started with Confluent Cloud is simple, you can test your hypothesis within minutes and have a production ready system without any hassle.?
Now, it is your turn to create an amazing gaming platform.
Confluent → The Data Streaming Platform
1 年Ahmet Rasim Karslioglu
Kong ?? | AI Gateway | API Management Platform | Microservices
1 年This is A Bible for Gaming developers ?? Well written article, Dr.!