Estuary

软件开发

New York，NY 15,006 位关注者

Data Movement for The Enterprise.

查看职位关注

查看全部 31 位员工

关于我们

Estuary helps organizations activate their data without having to manage infrastructure. Capture data from SaaS or database sources, transform it, and load it into any data system all with millisecond latency.

网站: https://estuary.dev
Estuary的外部链接
所属行业: 软件开发
规模: 11-50 人
总部: New York，NY
类型: 私人持股
创立: 2019
领域: Change Data Capture、ETL、ELT、Data Engineering、Data Integration、Data Movement、Data Analytics、Data streaming、Real-time Data、Data processing、Data Warehousing、Data replication、Data backup、PostgreSQL to Snowflake、MongoDB to Databricks、Data Activation和Stream Processing

产品

Estuary Flow

ETL 工具

Estuary Flow is the only platform purpose-built for truly real-time ETL and ELT data pipelines. It enables batch for analytics, and streaming for ops, and AI - set up in minutes, with millisecond latency.

地点

主要

244 Fifth Avenue

Suite 1277

US，NY，New York，10001

获取路线
West State St

US，Ohio，Columbus，43215

获取路线

Estuary员工

查看全部员工

动态

Estuary

15,006 位关注者
1 天前
举报此动态
Thinking about modernizing your infrastructure and migrating from #Oracle to #MongoDB? Check out our latest guide for three recommended ways: https://hubs.ly/Q02ZHbcR0

3 Effective Methods to Move Data from Oracle to MongoDB

estuary.dev

赞评论分享
Estuary

15,006 位关注者
6 天前
举报此动态
When building data pipelines, achieving exactly-once processing is often a holy grail for data engineers. But what does "exactly-once" truly mean, and why is it so challenging in data movement? Let’s break it down. What is Exactly-Once? In simple terms, it ensures that each event in your pipeline is processed one and only one time—no duplicates, no missing data. This precision is critical for downstream systems, especially when handling financial transactions, inventory updates, or analytics dashboards where correctness matters. Why is it Difficult? Data pipelines span multiple systems—message brokers, storage layers, databases—all of which have their own guarantees. For example: Message Brokers like Kafka handle at-least-once or at-most-once delivery, but introducing exactly-once semantics requires additional configurations. Stateful Systems (like your transformations) must handle partial failures gracefully, ensuring a retry doesn’t lead to duplication. Idempotency at the destination is vital. Without it, duplicate events can corrupt your data. How Does Estuary Flow Help? With Estuary Flow’s real-time connectors, we ensure exactly-once semantics from source to destination—whether you’re ingesting events from Kafka or writing to Iceberg tables. This is achieved through: 1. Transactional Guarantees: Flow checkpoints data during movement, ensuring retries are safe. 2. Idempotent Writes: Our platform generates consistent, deduplicated outputs even in complex scenarios like change data capture (CDC). 3. Unified Batch & Streaming Support: Flow allows you to move data without worrying about semantics breaking between real-time and batch processes. For data engineers, this means you can trust your pipelines—reducing the complexity of building error-prone retry mechanisms or cleaning up duplicates. ?? Curious about how exactly-once semantics works in Flow? Check out our page for more information! https://hubs.ly/Q02Z9q5z0

Estuary Flow | Real-time Data Pipeline & Integration Platform

estuary.dev

3 条评论

赞评论分享
Estuary

15,006 位关注者
1 周已编辑
举报此动态
#Airflow is not your only choice! We've curated a list of the top 9 Python ETL tools for Data Engineers. If you're interested in learning more about the likes of #polars, Bytewax and dltHub - read on. Check it out: https://hubs.ly/Q02Z1f1q0

Top 9 Python ETL Tools for Data Engineers in 2024

estuary.dev

2 条评论

赞评论分享
Estuary

15,006 位关注者
1 周
举报此动态
Looking to integrate your #PostgreSQL database with your #ApacheIceberg Data Lakehouse? Look no further! In this guide, we’ll show you how you can connect these two systems, and we’ll go into some detail on how to actually set up the integration! Learn more: https://hubs.ly/Q02YXKDf0

Postgres to Apache Iceberg: 2 Methods for Efficient Data Integration

estuary.dev

赞评论分享
Estuary转发了

Benjamin Rogojan

Fractional Head Of Data | Reach Out For Data Infra And Strategy Consults
1 周已编辑
举报此动态
When I first started in the data world, the first tool I used to build a data pipeline was SSIS. Since then it feels like I have come across every possible tool and custom data pipeline set-up possible(of course thats far from the truth). There seem to be hundreds of tools and methods for data teams use to get data from point A to point B. So I wanted to share some of those experiences as well as hear from Daniel Palma's experiences building data pipelines. What has changed? What has stayed the same? What challenges do data engineers still face today? Feel free to share some of your questions below!

10 Years Of Building Data Pipelines - What Has Changed

www.dhirubhai.net

13 条评论

赞评论分享
Estuary转发了

Jonas Best

Chief of Staff @ Bytewax ?? | Python-native stream processing for Machine Learning, GenAI, and IoT
1 周
举报此动态
For many organizations, Retrieval-Augmented Generation (RAG) have?become the go-to approach for making AI applications work seamlessly with proprietary company data. And let’s face it, no one wants their AI applications to rely on outdated information. Building real-time RAG pipelines just got a bit simpler. With Estuary you can connect to almost any data source and ingest updates in real time, seamlessly streaming them into Bytewax. Bytewax is purpose-built for creating robust, real-time embedding pipelines while harnessing the magic of Python. It integrates seamlessly with leading Python libraries like unstructured.io, Haystack, LangChain, and many others for document cleansing and chunking. For embedding generation, Hugging Face Transformers is a popular choice that offers countless pre-trained models, empowering your AI applications with powerful and flexible embeddings. With real-time RAG applications, you can be confident your AI outputs are not just free of hallucinations, but grounded in the most current, accurate data available. How important is up-to-date data for your AI application?
Daniel Palma

Data Engineer | Advisor
2 周

Hallucinations are one of many forms of wrong answers coming out of an AI application. Outdated information is just as common. RAG applications—those blending LLMs with fresh, specific data—depend heavily on how up-to-date and relevant the data is. To properly leverage the power of RAG, however, you have to be an expert at complex tasks like chunking, embedding generation, and adjusting context windows with every data update. Doing this in real-time is essential but notoriously tricky. Here's why it's such a challenge: 1?? Context Lengths and Chunking Long context windows can quickly become unmanageable and too costly to process. Splitting data into contextually coherent chunks requires managing freshness, relevance, and redundancy to avoid bloating response time or the LLM’s “memory.” 2?? Embedding Generation New data means new embeddings, and generating these embeddings in sync with fast-moving data pipelines is a complex, resource-intensive process. Constant updates mean endless re-indexing and re-evaluation to ensure your RAG application has the latest context. This is where Python frameworks like Pathway and Bytewax come in. These frameworks allow for event-driven data pipelines, enabling RAG applications to handle data transformations and updates with minimal lag. By processing and streaming events in real time, they help manage the continuous flow of new data so that RAG models can access the latest context without manual intervention. But there's still the matter of data integration. A platform like Estuary can complete the picture by connecting to any data source and ingesting data in real-time, providing RAG applications with an actual end-to-end data pipeline. How do you ensure your RAG apps are up to date?
赞评论分享
Estuary

15,006 位关注者
1 周
举报此动态
Struggling to balance data integration with strict security and compliance? Estuary Flow’s private deployments let you process data within your own cloud environment—securely, scalably, and in real-time. ?? Key Benefits: - Data Sovereignty: Keep sensitive data in your VPC. - Compliance Made Easy: Meet GDPR, HIPAA, or SOC 2 standards. - Unified Pipelines: Stream and batch data in one seamless platform. - Blazing Performance: High throughput, low latency, total control. Perfect for industries like #finance, #healthcare, and #supplychain, private deployments ensure you stay compliant while unlocking the power of real-time decision-making. Want to see how it works? Learn more: ?? https://lnkd.in/dYjSYaZx
赞评论分享
Estuary

15,006 位关注者
1 周
举报此动态
Check out how quickly you can connect your Amazon #RDS instance to Snowflake using Estuary Flow's fully managed change data capture (CDC) connectors.

1 条评论

赞评论分享
Estuary

15,006 位关注者
1 周
举报此动态
Combining real-time data ingestion with the flexibility of a data lakehouse architecture is more important than ever. That’s why we’ve put together a step-by-step guide on how to set up a streaming lakehouse using Estuary Flow, #ApacheIceberg, and #PyIceberg! ???? In this article, you’ll learn how to: 1?? Ingest data in real-time using Estuary Flow’s Change Data Capture (CDC) from your source system. 2?? Store and manage your data in Apache Iceberg, enabling scalable and reliable storage. 3?? Perform powerful queries with PyIceberg & pandas for near-instant insights. Whether you’re building real-time analytics pipelines or looking to leverage the full potential of a streaming lakehouse, this guide will help you get started and scale your architecture. Ready to dive into the world of streaming lakehouses? Check out the full guide and start building your own robust data architecture today! ?? https://lnkd.in/d-yitUv7

Building a Streaming Lakehouse with Estuary Flow and Apache Iceberg

estuary.dev

赞评论分享
Estuary转发了

Estuary

15,006 位关注者
2 周
举报此动态
?? Ready to elevate your Retrieval-Augmented Generation (RAG) workflows? In our latest blog, Real-Time RAG with Estuary and Pinecone, Shruti Mantri walks you through the essentials of integrating real-time data into your AI applications. From setup to seamless data flow, this guide shows you how to build responsive, up-to-date RAG models with Estuary and Pinecone. Curious about the setup and impact? Dive into Shruti’s full guide here: ?? https://lnkd.in/eFv4fxru

Real-time RAG with Estuary Flow and Pinecone

estuary.dev

2 条评论

赞评论分享

相似主页

查看职位

查看Estuary有谁可以为您内推

Estuary

软件开发

New York，NY 15,006 位关注者

Data Movement for The Enterprise.

关于我们

产品

Estuary Flow

ETL 工具

地点

Estuary员工

David Yaffe

Co-Founder at Estuary, Previously Co-Founder of Arbor (Acquired by LiveRamp)

Travis Jenkins

Frontend Engineer at Estuary

Andrew Gale

Moving data geniuses to try new/useful things

Mike Danko

Lead Cloud Infrastructure Engineer at Estuary

动态

10 Years Of Building Data Pipelines - What Has Changed

www.dhirubhai.net

立即加入，查看您错过的职场动态

相似主页

Mage

Seattle Data Guy

Acheron Analytics

Portable

Mozart Data

Airbyte

Roe AI

5X

Estuary Solutions

Y42

查看职位

助理软件工程师职位

数据科学家职位

布道者职位

Python 开发员职位

软件工程师职位

招聘人员职位

产品经理职位

业务发展代表职位

IT 主管职位

主管职位

共同创始人职位

技术招聘专员职位

总顾问职位

激光工程师职位

工程师职位

营销总监职位

高级电气工程师职位

高级机械工程师职位

机械设计工程师职位