Presto Foundation的封面图片
Presto Foundation

Presto Foundation

软件开发

San Francisco,California 2,306 位关注者

Hosted by the Linux Foundation, Presto Foundation is focused on supporting and sustaining the Presto community!

关于我们

Supporting and sustaining the Presto community

网站
https://prestodb.io
所属行业
软件开发
规模
51-200 人
总部
San Francisco,California
类型
非营利机构
创立
2019

地点

Presto Foundation员工

动态

  • Presto Foundation转发了

    查看Ali LeClerc的档案

    Head of Open Source Strategy at IBM

    I’ve been working with the Presto community for over five years now ?? and I continue to be inspired by the energy and momentum behind the project. Every day I get work with many different companies that are pushing Presto further...whether it's optimizing it for massive workloads, fine-tuning performance, or contributing back to open source. There's a lot of noise in the market about the future of Presto, but when you look at what’s happening in production today, the reality is clear: companies around the world are relying on Presto for their most critical analytics workloads. Organizations choose Presto because it allows them to: ?? query data at massive scale without vendor lock-in ?? separate storage from compute for cost-efficient analytics ?? use a single SQL engine across BI, AI/ML, ad-hoc analytics, etc. To highlight some examples: ? Adobe Advertising - uses Presto for large-scale ad analytics, processing billions of events daily for audience targeting, real-time bidding, and performance reporting - runs scheduled pipelines and ad-hoc queries, supporting ML model training, dashboards, and custom advertiser reports over 400 billion records - optimized query execution for cost efficiency, enabling interactive analytics and scheduled reporting with 3–10x better performance than other engines ? apna - runs Presto on Kubernetes with auto-scaling to handle 60,000+ queries per day across 500+ tables, optimizing for cost and performance ? Bolt - migrated from Redshift to an open lakehouse with Presto to overcome scalability limits and cost inefficiencies - uses dedicated Presto clusters for BI, ad hoc queries, and automated workloads, serving petabyte-scale queries efficiently ? e& Egypt - replaced Impala with Presto for querying the data lake, significantly improving performance and scalability for analytics - uses Presto for federated queries across multiple data sources, reducing data silos and simplifying data access ? Uber - runs Presto across 50+ clusters and four regions, supporting ad hoc analytics, ETL, A/B testing, and operational insights - developed a dynamic queuing system for fair resource management, optimizing query routing across multiple cluster groups Presto continues to evolve and grow, with major investments in performance (like Velox), open-source contributions, and real-world production deployments at scale. I'm excited about the work we're doing, the usage we're seeing, and what's ahead ?? And I appreciate our community!

    • 该图片无替代文字
  • ? Apache Arrow Flight Comes to #Presto! ? Join us at our next virtual tech talk (streamed live on LinkedIn) to learn more on the Presto / Apache Arrow Flight connector, which enables high-speed, efficient data exchange with remote sources. Arrow Flight’s zero-copy transfers make it easier than ever to integrate custom data sources with Presto. Bryan Cutler will share more of his work and dive into how it works, the performance benefits, and best practices for building your own connectors.

    Presto Meetup: Seamless Data Integration with Apache Arrow Flight in Presto

    Presto Meetup: Seamless Data Integration with Apache Arrow Flight in Presto

    www.dhirubhai.net

  • Hey Presto community ??? Join us at #VeloxCon this year at Meta HQ in the Bay Area. We'll have several sessions covering the latest work in Presto C++, the native engine built with #Velox. ?? Ying Su, Aditi Pandit, Soumya D., Christian Zentgraf, Deepak Majeti, Minhao Cao from IBM will present sessions on Iceberg support, new Presto C++ features, and operational efficiency. ?? Amit Dutta from Meta and Sergey Makagonov from Uber will share how they're using Presto C++ in production. Registration is free for the community, details below!

    查看Velox的组织主页

    422 位关注者

    ?? VeloxCon 2025 Agenda is Live! https://lnkd.in/gPv9yH-X We're excited to share our first round of the agenda is live and includes cutting-edge talks on Velox, Apache Gluten, #Presto C++, AI/ML hardware acceleration & more. Highlights include: ?? Velox in the Age of Machine Learning from Pedro Pedreira on how Velox is evolving to power next-gen compute for AI/ML ?? Google Dataproc’s Velox + Apache Gluten Integration – How Velox is revolutionizing big data analytics from Abhishek Modi ?? Use case talks from Uber and Pinterest on their use of Velox ?? Presto C++ Deep Dives – Iceberg support, performance optimizations, and new features in the C++ engine from Ying Su, Deepak Majeti, Aditi Pandit, Christian Zentgraf, Soumya D., Minhan Cao ?? GPU Acceleration with RAPIDS cuDF – NVIDIA’s work integrating GPU-accelerated query execution from Gregory Kimball & Karthikeyan Natarajan We're excited to hear from Meta, Google, IBM, Uber, NVIDIA, Pinterest, and more. ?? April 15-16 at Meta HQ https://lnkd.in/gx8cEfHZ IBM Data, AI & Automation Uber Engineering Meta for Developers Pinterest Engineering Google for Developers IBM Developer

    • 该图片无替代文字
  • We're looking forward to kicking off our workshop series in IST! Details and register below ??

    查看Saurabh Mahawar的档案

    Developer Relations Engineer | Growth Strategist | Community | (Presto by Meta)

    Learn Presto—Hands-On! ?? Presto team kicking off a 3-week virtual workshop series for ? ???????? ??????????????????, ? ???????? ????????????????, ? ????????-???????????? ?????????????????????? or anyone who is eager to get practical with #Presto! Workshop Schedule: ?? Getting Started with Presto – 12th March, 4 PM (IST) by Yazhini K ?? Building an Open Data Lake-house with Presto and Apache Iceberg – 19th March, 4 PM (IST) by Ajay Kharat ?? Diving into and benchmarking Presto C++, next-gen Presto – 26th March, 4 PM (IST) by Shakyan Kushwaha Registration Link: https://lnkd.in/dq2J7iwn Follow Presto Foundation #Presto #DataEngineering #SQL #OpenSource

    • 该图片无替代文字
    • 该图片无替代文字
    • 该图片无替代文字
  • ? Community User Spotlight: How Jio Platforms Powers Large-Scale Analytics with Presto We sat down with Sonal Holankar of Jio Platforms Limited (JPL) to learn how they use #Presto across seven Hadoop clusters to power analytics for internal applications, integrating seamlessly with Hive, Iceberg, Kudu, and Tableau ?? Key Presto benefits: ? Execute queries in seconds across massive datasets ? Enable seamless analytics through Tableau dashboards ? Manage performance at scale with custom monitoring Read the full story https://lnkd.in/gxXJgDTR

  • Presto Foundation转发了

    查看Velox的组织主页

    422 位关注者

    Why Velox? Modern data infrastructure is fragmented, with specialized computation engines built in silos. Meta developed #Velox, an open-source C++ database acceleration library, to solve this by providing high-performance, reusable, and dialect-agnostic execution components. Velox powers Presto, Spark, PyTorch data loading, and more, improving performance across analytical, streaming, and machine-learning workloads. By democratizing optimizations, it makes query engines and data systems faster, more consistent, and easier to maintain. Read the original research paper by Pedro Pedreira, Orri Erling, Maria Basmanova, Kevin Wilfong, Laith Sakka, Krishna Pai, Wei He, Biswapesh Chattopadhyay to learn more about Velox and its impact on data processing: https://lnkd.in/d-XUBxtZ

    • 该图片无替代文字
  • 查看Presto Foundation的组织主页

    2,306 位关注者

    ? Community User Spotlight: How Zain Sudan Transformed Their Data Lakehouse with Presto ? Yahya Elemam, Dir. of Big Data and Analytics at Zain Sudan, shares how they modernized their analytics platform from Cloudera to an open-source, real-time data lakehouse powered by #Presto. “The majority of queries now execute within one second, demonstrating significant performance improvements. Without Presto, direct connectivity between our visualization tools and the data lakehouse would have been impossible.” Their open data lakehouse is built on open-source! Really awesome to see a bunch of OS software including Apache NiFi, Hadoop, Iceberg, Spark, Apache Airflow, Apache Superset, Apache Hue, Apache Zeppelin, and JupyterLab. Read the full transformation story here: https://lnkd.in/gN8XhEkS ?? Big thanks to Yahya and the Zain Sudan team!

    • 该图片无替代文字
  • Presto Foundation转发了

    查看Anand Kumar Rai的档案

    Building Kubepod.io | K8SUG India Organizer

    ?? PrestoDB: High-Performance Analytics at Scale! ?? Saurabh Mahawar (Developer Relations Engineer, IBM) is live at K8SUG - the Most Active K8s + AI Meetup India, exploring how PrestoDB enables fast, distributed analytics across massive datasets. From query optimization to multi-source data federation, this session is a must for data enthusiasts! Are you using PrestoDB in your stack? Let’s discuss! ?? #PrestoDB #BigData #Analytics #DistributedSystems #CloudNative

    • 该图片无替代文字
  • Presto Foundation转发了

    查看Dipankar Mazumdar, M.Sc的档案

    Staff Data Engineer Advocate @Onehouse.ai | Apache Hudi, Iceberg Contributor | Author of "Engineering Lakehouses"

    Query Optimization - Lakehouse. Sometime back in a closed group workshop, I presented how 'clustering' table service in Apache Hudi makes a huge impact on the overall query performance. To highlight the difference, I ran the same query using #Presto once before clustering & after in a 1 TB TPC-DS dataset. Query: ???????????? ??_????????_???? , ??????(????_????????????????)????????1 , ??????(????_????????_??????????)???????2 , ??????(????_????????????_??????)???????3 , ??????(????_??????????_??????????) ??????4 ???????? ??????_????.??????_1????_??????????????????_???????????????? ?????????? ????_????????????????_???????? = '??????????' ?????????? ???? ??_????????_???? As you can see from the Presto UI, the unclustered table scanned 2.62 billion rows, whereas the clustered one scanned just 847 million records. That's a significant difference ?? Now imagine in a production environment, where you will have multiple analysts & engineers running similar queries on certain predicates over & over. For such cases, Clustering is extremely beneficial & is therefore important to run. One of the cool things about Hudi is that there are different deployment modes to run clustering and most importantly asynchronously: ? Inline: Happens synchronously with the regular ingestion writer. This means the next round of ingestion cannot proceed until the clustering is complete. ? Async (same job): Hudi will run the clustering operation after each commit is completed as part of the ingestion pipeline. Hudi spins up another thread within the same job for this. ? Async (standalone job): A separate clustering job execute the clustering operation. By running a different job for the clustering operation, it rebalances how Hudi uses compute resources. ? Async (manual trigger): the application schedules the clustering in one job; in another, the application executes the clustering plan. The supported writers won’t be blocked from ingesting data. These asynchronous modes allows you to deploy a clustering service as per your business needs and to balance out ingestion with these table management services. Detailed reading in comments.

    • 该图片无替代文字

相似主页

查看职位