Applying Some 'Unconventional Wisdom' to Improve Model Scoring wrt. High-Velocity Streaming Data

Faisal Waris

Data Scientist/Gen. AI Strategist in the telecom industry

发布日期: 2023年1月17日

Some data streams in the telecom industry can exceed the rate of 50,000 messages / second. And at about 300KB per message (6TB / day), the size is not tiny either. How does one handle such high-velocity / high-volume data for real-time scoring of ML models? Well, the conventional (or dominant) architecture for such problems is Kafka in conjunction with Spark Streaming. At the said rate/size, however, the Spark infrastructure requirements become substantial - approaching 100 core / 100 GB of compute. In fact, for a similar problem the cloud infrastructure costs exceeded $50K / month! And this was for batch-mode processing! [Sample code: fwaris/OrleansKafkaSample (github.com)]

The Unconventional Approach

Can one do better - i.e., use a more efficient architecture? To answer this question, I experimented with an alternative approach. Instead of Spark, I used Orleans Streaming -- deployed to a modest on-prem Kubernetes cluster -- and the F# programming language with 'AsyncSeq' -- a framework for asynchronous stream processing. Let's just say this wasn't exactly a walk in the park for me. I am well-versed in F# but Kafka, Kubernetes and Orleans Streaming were all relative unknowns.

screenshot of kubernetes resource consumption — Memory (10 GB) and CPU (11.8 cores) of the Orleans-F# app on Kubernetes (screenshot from 'Octant' tool)

It took me a few weeks to get all the pieces together, but the final result was shockingly good. To process 30,000 messages / second (ea. 300KB in size) the compute needed was only 12 core / 10 GB!

This is an amazing result and if this architecture is deployed widely, it should result in substantial savings for my company when compared to cloud and even on-prem Spark infrastructure costs.

But Why?

The question is why this is so? The reasons are many -- ranging from software (in)efficiency to something rooted in fundamental computer science. While it is hard to prove anything without rigorous scientific experimentation, here are my educated guesses for the reasons why:

Spark was developed primarily for batch-mode processing. Streaming was added later and therefore Spark is not most optimal stream processing framework
The Orleans/Dotnet Kafka and JSON processing libraries are much more efficient (than what is packaged in Spark). The underlying Kafka library for Dotnet is based on librdkafka - a highly tuned Kafka client written in C++. Similarly, Dotnet System.Text.Json is very efficient for JSON parsing because it relies on the Dotnet Span<'t>/Memory<'t> constructs that reduce unnecessary memory allocation and can efficiently parse data as its being read 'off the wire'.
Both Orleans and F# AsyncSeq make heavy use of Dotnet's asynchronous processing features. Asynchronous processing is crucial for extracting the most performance from a constrained compute environment when running an I/O heavy?workload. Each Kubernetes 'pod' is given only 2 virtual cores. If a core is blocked waiting for data (to be read or written) the core is not available for any other useful work - this is disastrous for performance. Fortunately, the AsyncSeq F# library provides the right abstractions for easily expressing asynchronous (non-blocking) stream processing logic. The library leverages F# computation expressions (which are a realization of Monads in computer science).

Conclusion

Unsatisfied with the resource requirements of conventional stream processing architectures, I applied some unconventional wisdom to create an alternative architecture that yielded a phenomenal increase in efficiency.

Frankly, the ability to efficiently process high-velocity / high-volume streaming data provided a huge relief to me. It means that the business case for supporting a particular use?case is now much easier to justify. And as such, we (our group) can support more of the needs of our internal customers.

Carl in 't Veld

1 年

Can you share the source code of a small example?

Evan Howlett

Sr. Fullstack Software Engineer | SQL, F#, C#

2 年

This is a great write up! I've been wanting to try out the Orleans framework for something but never found a good application for it. was it a headache to deal w/ all the C#-isms? Are your price estimates coming from Azure or another host provider?

Ramón Soto Mathiesen

"Datalogist" (*) @ SPISE MISU ApS

2 年

Coming from F# Online. Maybe I'm misunderstanding something, but reading in the same sentence "Kafka and JSON" sounds a bit odd. I've experience with Kafka and also in combination with `dotnet`. What you would "normally" do is read `raw bytes` and then transform them to either some `specific` or `generic` types: - https://avro.apache.org/docs/1.8.2/api/csharp/html/namespaceAvro_1_1Specific.html - https://avro.apache.org/docs/1.8.2/api/csharp/html/namespaceAvro_1_1Generic.html The important part, is to avoid using "slow" (text) data containers, such as JSON/XML. But again, I might be misunderstanding your setup. Btw, speaking of JSON and F#. I've realized that since LTS 6.0 if you don't tag your (record) types with: ```fsharp open System.Runtime.Serialization [<DataContract>] type t = { [<field: DataMember(Name="foo")>] foo: string [<field: DataMember(Name="bar")>] bar: string } ``` when serializing, you would end up with: ```json { "foo@":"value0", "bar@":"value1", "foo":"value0", "bar":"value1", } ``` which results in sending x2 the size of a given data payload. I wonder who were the "geniuses" behind this decision ??

查看更多评论

要查看或添加评论，请登录

Faisal Waris的更多文章

Revisiting Logic Puzzle with "o1"

2024年9月29日

Revisiting Logic Puzzle with "o1"

A few months ago, I tested the reasoning capabilities of then latest 'gpt' model. Here is a link to that article: An…
Phi-3 Vision is a Surprisingly Useful Gem

2024年7月8日

Phi-3 Vision is a Surprisingly Useful Gem

My work involves building RAG applications for question-answering over highly technical internal company documents. The…
Constrained and Provable LLM Code Generation

2024年3月11日

Constrained and Provable LLM Code Generation

LLMs are now good at generating code but human intervention is still required. Can't accept the generated code blindly…

1 条评论
An Investigation into LLM Reasoning Capabilities (+ 'Zebra' puzzles & SMT Solvers)

2024年1月15日

An Investigation into LLM Reasoning Capabilities (+ 'Zebra' puzzles & SMT Solvers)

With the (preview) release of GPT 4 Turbo, OpenAI has updated its Technical Report on GPT performance. The results are…
An Elegant Web Application Architecture for Contemporary Times

2023年12月16日

An Elegant Web Application Architecture for Contemporary Times

It used to be that as data scientists we rarely built full-stack production applications. However, that is changing…

1 条评论
FsOpenAI: A GPT 'chat' app for Internal Organizational Data

2023年8月14日

FsOpenAI: A GPT 'chat' app for Internal Organizational Data

#chatgp #azureopenai #semantickernel #semanticsearch #fsharp Unsurprisingly, the demand for accessing Large Language…
Resource-efficient model deployment

2022年9月28日

Resource-efficient model deployment

AI/ML is now mainstream. Model scoring capacity requirements are ever-increasing.
Text Classification with BERT and .Net

2021年11月21日

Text Classification with BERT and .Net

Transformer based models are currently the state-of-the-art for text classification and other natural language related…
Graph Convolutional Network Model with a Strongly-typed Functional Language

2021年5月17日

Graph Convolutional Network Model with a Strongly-typed Functional Language

My present job requires me to work with network or graphical data formats. Graphical data are not readily amenable to…

1 条评论
Lessons learnt in moving a data science 'project' to 'product'

2020年10月11日

Lessons learnt in moving a data science 'project' to 'product'

Data science is complex and so is software engineering. The nature of contemporary technology work often requires…

1 条评论

See all articles

The Unconventional Approach

But Why?

Conclusion

Faisal Waris的更多文章

Revisiting Logic Puzzle with "o1"

Phi-3 Vision is a Surprisingly Useful Gem

Constrained and Provable LLM Code Generation

An Investigation into LLM Reasoning Capabilities (+ 'Zebra' puzzles & SMT Solvers)

An Elegant Web Application Architecture for Contemporary Times

FsOpenAI: A GPT 'chat' app for Internal Organizational Data

Resource-efficient model deployment

Text Classification with BERT and .Net

Graph Convolutional Network Model with a Strongly-typed Functional Language

Lessons learnt in moving a data science 'project' to 'product'

社区洞察