登录查看更多内容

Serverless streaming job on IBM Cloud

Simone Romano

Associate Partner - AI & Analytics Practice Leader

发布日期: 2023年2月13日

Introduction

If you are designing a solution on IBM Cloud including streaming data ingestion flow, you must consider the use of IBM Event Streams, IBM Data Engine, IBM Cloud Object Storage and IBM Key Protect services to setting up a serverless streaming job.

Using this approach, you can write up to 1MB/s data to IBM Cloud Object Storage to without to write a single line of code. This article will give you all the concepts needed to configure your serverless streaming job.

Serverless Streaming Setup

Designing an architecture that include a streaming job on IBM Cloud, keep in mind following observations, valid at the time I’m writing this article:

the unique format supported by streaming job is JSON as input and Parquet as output
max supported throughput is 1MB/second
streaming job will write data each 5 minutes if there is at least one row. If there are more than 90k rows in a batch, the job will accelerate (2.5 minutes) and so on up until it runs every 10s
there are two ways to submit streaming job: first way is to use Event Streams UI and following the instructions at this link; another approach is to submit the job via SQL command from IBM Data Engine console

In this article we will explain how to submit a streaming job via SQL command from IBM Data Engine console. Remember that, in this case, you cannot stop the job via UI but you can use Data Engine REST API described at this link.

To submit a streaming job on IBM Cloud you need to provision the following IBM Cloud Services from IBM Cloud catalogue:

EventStreams – Standard plan
IBM Cloud Object Storage
DataEngine – Standard plan
Key Protect

After you provisioned the services listed above, you need to complete following steps:

create a topic in your Event Streams: you can use IBM Cloud UI to create a new topic in Event Streams service, let’s call it topic_1
Create a bucket in your Object Storage instance: you can use IBM Cloud UI to create a bucket, let’s call it bucket_1
Create a service ID from IAM page of your account with both permissions to read from EventStreams, to write to ObjectStorage; add an API Key for this service ID and create a Key in Key Protect with the API Key encoded in base64: this step is fundamental to submit the job via Data Engine

领英推荐

Hybrid Cloud and AI: Highlights from Red Hat Summit

Angie Jones 10 个月前

Migrating from DC/OS to Kubernetes: A Deep Dive into…

Klarrio 5 个月前

Is Red Hat’s AI-Fueled OpenShift Strategy Enough to…

Steven Dickens 4 个月前

Non è stato fornito nessun testo alternativo per questa immagine — Configuration requirements - high level overview

Once you have all the infrastructures and configurations described above, you can submit your first serverless streaming job on IBM Cloud. You can access to Data Engine console and write a query as follow:

SELECT * FROM {EVENT_STREAMS_INSTANCE_CRN}/topic_1

STORED AS JSON

EMIT cos://{YOUR_BUCKET_REGION}/bucket_1/{PREFIX} STORED AS PARQUET

EXECUTE AS {YOUR_KEY_PROTECT_CRN}:key:{YOUR_KEY_ID}

Your job is now up and running, you can see job details from your data engine instance, switching to “Streaming job” section, see image below.

Partitioning

Streaming landing job on IBM Cloud can support high throughput data (up to 1MB/second at the time I’m writing this article). In these scenarios, it would be useful to have a way to partition data to improve query latencies for data analysis. Right now, IBM Cloud Data Engine does not support this feature, but it is something we will have in next release. The way you will be able to partition your data will be to submit another streaming job that will read data from input bucket, reorganize them, and write them into another bucket in a partitioned manner.

Conclusion

The approach described in this article allows you to setting up a managed streaming job without to write a single line of code. In this case you can spend less effort on ETL task and focus on data analysis tasks to give answers to your business.

Simone Romano的更多文章

Music composition and rapid prototyping with generative AI and IBM watsonx

2024年5月8日

Music composition and rapid prototyping with generative AI and IBM watsonx

Introduction Welcome back to the fascinating world of GenAI, this time to investigate its powerful capabilities in…

4 条评论
Revolutionizing Document Management in SAP with Generative AI

2024年2月1日

Revolutionizing Document Management in SAP with Generative AI

Introduction Extracting information from digitized documents, such as photos or scans, can be a challenging task…
Generative AI happened

2024年1月10日

Generative AI happened

Last year was a special year for the artificial intelligence: as I mention in my last blogs, "Generative AI Happened"…

4 条评论
Generative AI to improve OCR

2024年1月10日

Generative AI to improve OCR

Introduction Optical Character Recognition (OCR) technology plays a crucial role in various industry sectors by…

1 条评论
Innovative approach to AI project delivery with Generative AI

2023年11月24日

Innovative approach to AI project delivery with Generative AI

Introduction Traditional AI is really effective to address specific use cases, supported by data scientists team and…
Unlocking the power of generative AI to visualize functional requirements

2023年10月24日

Unlocking the power of generative AI to visualize functional requirements

Introduction One of major time-consuming activity for an IT architect is to convert functional requirements of an IT…
Generative AI for tabular data explanation: prompt limit is not a limit

2023年10月18日

Generative AI for tabular data explanation: prompt limit is not a limit

INTRODUCTION Generative AI, and in particular large language models (LLMs), have being experimented to summarise texts,…
AI pipeline to "play a picture of a musical score", and its implication in generative AI

2023年10月15日

AI pipeline to "play a picture of a musical score", and its implication in generative AI

Introduction Understand, interpret and listen the content of a musical score is something difficult if you are not a…

5 条评论
Talking with a GraphDB leveraging generative AI

2023年10月13日

Talking with a GraphDB leveraging generative AI

Have you ever wondered if it's possible to navigate a graph database using the power of generative AI? The answer is a…

2 条评论
Generative AI impact on data platform solutions

2023年10月8日

Generative AI impact on data platform solutions

Introduction Cognitive enterprises exist. Many organisations reshaped themself in last decade creating data driven…

1 条评论

See all articles

Serverless streaming job on IBM Cloud

Simone Romano

Associate Partner - AI & Analytics Practice Leader

Introduction

Serverless Streaming Setup

领英推荐

Partitioning

Conclusion

Related links

Simone Romano的更多文章

社区洞察

其他会员也浏览了

Building Resilient Cloud Systems: Practical Insights on Decoupling and Scaling

One Chat Beyond: Taking Enterprise AI Chatbots to the Next Level (2)Efficient High-Performance Computing with AWS Well-Architected HPC Lens

Greenplum Streaming Server 2.0: A Step Toward Distributed Data Ingestion

The Cloud on Self-Driving: Oracle Autonomous Database

How AI and Automation Power Kubernetes Optimization

Kubernetes Native Edge Computing Framework, KubeEdge

Enabling Fair and Transparent Fintech Solutions: Leveraging PaaS for Non-Biased Delivery

Serverless Computing in Azure

Container Orchestration with CNAPP Solutions: Kubernetes and Beyond

Demystifying Google Kubernetes Engine: Architecture and Working Model for the LinkedIn Audience.

Introduction

Serverless Streaming Setup

领英推荐

Partitioning

Conclusion

Related links

Simone Romano的更多文章

Music composition and rapid prototyping with generative AI and IBM watsonx

Revolutionizing Document Management in SAP with Generative AI

Generative AI happened

Generative AI to improve OCR

Innovative approach to AI project delivery with Generative AI

Unlocking the power of generative AI to visualize functional requirements

Generative AI for tabular data explanation: prompt limit is not a limit

AI pipeline to "play a picture of a musical score", and its implication in generative AI

Talking with a GraphDB leveraging generative AI

Generative AI impact on data platform solutions

社区洞察

其他会员也浏览了

Building Resilient Cloud Systems: Practical Insights on Decoupling and Scaling

One Chat Beyond: Taking Enterprise AI Chatbots to the Next Level (2)Efficient High-Performance Computing with AWS Well-Architected HPC Lens

Greenplum Streaming Server 2.0: A Step Toward Distributed Data Ingestion

The Cloud on Self-Driving: Oracle Autonomous Database

How AI and Automation Power Kubernetes Optimization

Kubernetes Native Edge Computing Framework, KubeEdge

Enabling Fair and Transparent Fintech Solutions: Leveraging PaaS for Non-Biased Delivery

Serverless Computing in Azure

Container Orchestration with CNAPP Solutions: Kubernetes and Beyond

Demystifying Google Kubernetes Engine: Architecture and Working Model for the LinkedIn Audience.