Serverless streaming job on IBM Cloud
Serverless streaming job on IBM Cloud: architectural overview

Serverless streaming job on IBM Cloud

Introduction

If you are designing a solution on IBM Cloud including streaming data ingestion flow, you must consider the use of IBM Event Streams, IBM Data Engine, IBM Cloud Object Storage and IBM Key Protect services to setting up a serverless streaming job.

Using this approach, you can write up to 1MB/s data to IBM Cloud Object Storage to without to write a single line of code. This article will give you all the concepts needed to configure your serverless streaming job.

Serverless Streaming Setup

Designing an architecture that include a streaming job on IBM Cloud, keep in mind following observations, valid at the time I’m writing this article:

  1. the unique format supported by streaming job is JSON as input and Parquet as output
  2. max supported throughput is 1MB/second
  3. streaming job will write data each 5 minutes if there is at least one row. If there are more than 90k rows in a batch, the job will accelerate (2.5 minutes) and so on up until it runs every 10s
  4. there are two ways to submit streaming job: first way is to use Event Streams UI and following the instructions at this link; another approach is to submit the job via SQL command from IBM Data Engine console

In this article we will explain how to submit a streaming job via SQL command from IBM Data Engine console. Remember that, in this case, you cannot stop the job via UI but you can use Data Engine REST API described at this link.

To submit a streaming job on IBM Cloud you need to provision the following IBM Cloud Services from IBM Cloud catalogue:

  1. EventStreams – Standard plan
  2. IBM Cloud Object Storage
  3. DataEngine – Standard plan
  4. Key Protect

After you provisioned the services listed above, you need to complete following steps:

  1. create a topic in your Event Streams: you can use IBM Cloud UI to create a new topic in Event Streams service, let’s call it topic_1
  2. Create a bucket in your Object Storage instance: you can use IBM Cloud UI to create a bucket, let’s call it bucket_1
  3. Create a service ID from IAM page of your account with both permissions to read from EventStreams, to write to ObjectStorage; add an API Key for this service ID and create a Key in Key Protect with the API Key encoded in base64: this step is fundamental to submit the job via Data Engine

Non è stato fornito nessun testo alternativo per questa immagine
Configuration requirements - high level overview

Once you have all the infrastructures and configurations described above, you can submit your first serverless streaming job on IBM Cloud. You can access to Data Engine console and write a query as follow:

SELECT * FROM {EVENT_STREAMS_INSTANCE_CRN}/topic_1

STORED AS JSON

EMIT cos://{YOUR_BUCKET_REGION}/bucket_1/{PREFIX} STORED AS PARQUET

EXECUTE AS {YOUR_KEY_PROTECT_CRN}:key:{YOUR_KEY_ID}        

Your job is now up and running, you can see job details from your data engine instance, switching to “Streaming job” section, see image below.

Non è stato fornito nessun testo alternativo per questa immagine
Data Engine streaming job details

Partitioning

Streaming landing job on IBM Cloud can support high throughput data (up to 1MB/second at the time I’m writing this article). In these scenarios, it would be useful to have a way to partition data to improve query latencies for data analysis. Right now, IBM Cloud Data Engine does not support this feature, but it is something we will have in next release. The way you will be able to partition your data will be to submit another streaming job that will read data from input bucket, reorganize them, and write them into another bucket in a partitioned manner.

Conclusion

The approach described in this article allows you to setting up a managed streaming job without to write a single line of code. In this case you can spend less effort on ETL task and focus on data analysis tasks to give answers to your business.

Related links

  1. Stopping jobs - https://cloud.ibm.com/apidocs/sql-query-v3#stopsqljob
  2. Streaimng COS tutorial - https://cloud.ibm.com/docs/sql-query?topic=sql-query-event-streams-landing#data-on-cos
  3. Streaming COS tutorial - 2 - https://www.ibm.com/cloud/blog/stream-landing-from-event-streams-kafka-service-to-ibm-cloud-data-lake-on-object-storage

?

?

要查看或添加评论,请登录

Simone Romano的更多文章

社区洞察

其他会员也浏览了