Arroyo 0.14.0 is now available, with some great new features, improvements, and fixes, including: ?? Lookup joins ?? Nested updating aggregates {} Struct types ?? Streaming SQL syntax ?? Sink shuffles Thanks to all of our contributors for this release and especially Ratul D. and Nathan Lapierre who had there first contributions in this release! See the full release notes on the Arroyo blog: https://lnkd.in/dyFtxtf4
关于我们
Arroyo is bringing real-time data to every company with the Arroyo Streaming Engine
- 网站
-
https://www.arroyo.dev
Arroyo的外部链接
- 所属行业
- 软件开发
- 规模
- 2-10 人
- 总部
- Berkeley,CA
- 类型
- 私人持股
- 创立
- 2022
地点
-
主要
US,CA,Berkeley
Arroyo员工
动态
-
Arroyo转发了
Arroyo just crossed 4,000 GitHub stars! Amazing to see the community come together over the past two years to build a better stream processor. Thanks to all of our users and contributors for helping us reach this milestone!
-
-
Arroyo转发了
The sad, dark truth of the data world is that—for all of our fancy algorithms and systems and careful performance engineering, a majority of CPU time might just go to...decoding JSON. In some other slice of the multiverse data teams have all moved to efficient formats like Avro and Protobuf, but our fallen world still runs on a sort-of-specified data serialization format extracted from a frontend programming language, itself famously created in 10 days. So if we're going to have to read JSON, we might as well do it quickly. And if you've ever been curious how we do this at Arroyo, have I got the incredibly long and in-depth explanation for you!
-
Arroyo转发了
As Arroyo has grown, our internal analytics needs have outpaced our initial, ad-hoc data infra. When it came time to rebuild it, we turned to the best technologies of the modern data stack: an object-storage based data lake queried by DuckDB. And of course, Arroyo itself to provide near-real-time ingestion. We're so happy with how this turned out, I thought it would be worth documenting for other folks looking to build an easy, cheap, near-real-time analytics system. We're calling this approach the LOAD stack, for log storage/object storage/Arroyo/DuckDB. In our deployment, we combine several managed and open-source tools to provide sub-minute access to data at a small fraction of the cost of fully-managed solutions like Databricks or Snowflake: ? AWS Lambda to get events in ? Redpanda Data Serverless to store them for processing ? Arroyo for efficient and fault-tolerant ingestion ? S3 for long-term storage ? DuckDB via AWS SageMaker Notebook for analysis At Lyft, it took an entire team two+ years to build out a comparable stack; with modern tooling, I think an individual can build something comparable in a few hours today. Find the full walkthrough in our writeup:
-
The Arroyo team is excited to close out the year with the release of Arroyo 0.13! This is our fifth release of the year, and caps an incredible 12 months for the project and community. New features include: ?? Source metadata support ?? RabbitMQ streams connector ? Atomic updating outputs ?? IAM auth for Kafka ????Operator chaining We are especially thrilled that this release includes work from four new contributors to the project. Huge thanks to everybody who contributed: * Harshit P. * Xin Hao (@haoxins) * Tiago Campos * Matt Forbes * Vipul Vaibhaw * Erle Carrara * Micah Wylde See all of the details on our blog, and try it out with $ brew install arroyosystems/tap/arroyo
-
Arroyo转发了
If you missed #p99conf last week, talks are now available to stream on YouTube. I spoke about the design decisions that went into Arroyo's incredible performance: https://lnkd.in/g8-rrGWR. Come for the Rust hot takes, stay for my terrible hand-drawn architecture diagrams ??
P99 CONF 2024 | Latency, Throughput & Fault Tolerance: Arroyo Streaming Engine by Micah Wylde
https://www.youtube.com/
-
Arroyo转发了
We've been able to build a great open source community around Arroyo, with outside contributors adding major features and improvements—even though it's a streaming SQL engine, a piece of deep infrastructure with a high barrier to entry. Building a real community is something lots of projects struggle with. How did we do it? ? Starting with a friendly community meeting place where new contributors can meet the team, ask questions, and find mentorship (for us this is Discord) ? Doing the work of creating (and tagging) issues specifically for new contributors. This takes a lot of effort! They need to be well-documented, with enough context for someone to pick up cold. ? Cleaving off a part of the codebase that's mostly disconnected, with clean integration points to the rest of the system. For us this is our connectors subproject. which contains code to connect Arroyo with other systems. We've had multiple big contributions here, including NATS and MQTT connectors. ? Providing efficient PR reviews and actively helping users get their changes merged. Nothing kills motivation like waiting 2 months for a review. This all takes work and time, but we've found it incredibly worthwhile. (And if you've ever been interested in contributing to an open source data infra project, get in touch!)
-
The Arroyo team is thrilled to announce that Arroyo 0.12.0 is now available! This release introduces ?? Python UDFs ??, which allow developers to extend the engine with custom functions. Also new in this release: ?? Support for Protobuf as an ingestion format ?? Much faster JSON functions and new PG-inspired JSON syntax ?? Custom TTLs for updating state ?? AWS IRSA support along with many other improvements and fixes. This release wouldn't have been possible without all of our amazing contributors, including several new to the project: ? Xin Hao (@haoxins) ? Jayshan Raghunandan (@jr200) (new!) ? Marco Lugo (new!) ? Micah Wylde ? Tiago Campos (new!) ? ZhuLiquan (@zhuliquan) (new!) With Python support, we're excited to bring powerful stream processing to a whole new set of developers. We can't wait to see what you build! https://lnkd.in/g-dEyBqh
-
Arroyo转发了
Excited for the SF DataFusion meetup next Wednesday! I'll be giving a talk about how Arroyo implements dynamically-loaded UDFs. Because Rust lacks a stable ABI, this is harder than it sounds—different compiler versions or even changes to flags can break code loading. But we don't want to recompile our entire engine just to use a UDF. This gets even harder if we're trying to use async across a UDF boundary (which Arroyo has to support to enable things like HTTP calls, database lookups, and model inference in UDFs). How do we do it? You'll have to come to the meetup to find out. But I'll give you a hint: it involves C ?? See you there!
-
Arroyo转发了
Arroyo is coming to Current 2024! Excited to see everyone in Austin next week.
-