Serverless streaming job on IBM Cloud
Introduction
If you are designing a solution on IBM Cloud including streaming data ingestion flow, you must consider the use of IBM Event Streams, IBM Data Engine, IBM Cloud Object Storage and IBM Key Protect services to setting up a serverless streaming job.
Using this approach, you can write up to 1MB/s data to IBM Cloud Object Storage to without to write a single line of code. This article will give you all the concepts needed to configure your serverless streaming job.
Serverless Streaming Setup
Designing an architecture that include a streaming job on IBM Cloud, keep in mind following observations, valid at the time I’m writing this article:
In this article we will explain how to submit a streaming job via SQL command from IBM Data Engine console. Remember that, in this case, you cannot stop the job via UI but you can use Data Engine REST API described at this link.
To submit a streaming job on IBM Cloud you need to provision the following IBM Cloud Services from IBM Cloud catalogue:
After you provisioned the services listed above, you need to complete following steps:
领英推荐
Once you have all the infrastructures and configurations described above, you can submit your first serverless streaming job on IBM Cloud. You can access to Data Engine console and write a query as follow:
SELECT * FROM {EVENT_STREAMS_INSTANCE_CRN}/topic_1
STORED AS JSON
EMIT cos://{YOUR_BUCKET_REGION}/bucket_1/{PREFIX} STORED AS PARQUET
EXECUTE AS {YOUR_KEY_PROTECT_CRN}:key:{YOUR_KEY_ID}
Your job is now up and running, you can see job details from your data engine instance, switching to “Streaming job” section, see image below.
Partitioning
Streaming landing job on IBM Cloud can support high throughput data (up to 1MB/second at the time I’m writing this article). In these scenarios, it would be useful to have a way to partition data to improve query latencies for data analysis. Right now, IBM Cloud Data Engine does not support this feature, but it is something we will have in next release. The way you will be able to partition your data will be to submit another streaming job that will read data from input bucket, reorganize them, and write them into another bucket in a partitioned manner.
Conclusion
The approach described in this article allows you to setting up a managed streaming job without to write a single line of code. In this case you can spend less effort on ETL task and focus on data analysis tasks to give answers to your business.
Related links
?
?