Coiled

软件开发

New York，New York 2,529 位关注者

关注

查看全部 18 位员工

关于我们

Python, but big. Churn through a ton of data, no cloud expertise needed.

网站: https://coiled.io
Coiled的外部链接
所属行业: 软件开发
规模: 11-50 人
总部: New York，New York
类型: 私人持股
创立: 2020

地点

主要

US，New York，New York，10018

获取路线

Coiled员工

查看全部员工

动态

Coiled

2,529 位关注者
6 小时前
举报此动态
HPC job scripts are simple and accessible to almost anyone. Oddly, they're pretty hard to replicate on the cloud, so we replicated the API with Coiled. Learn more about the new API in our latest newsletter, plus performance improvements in Dask-backed Xarray workflows and how to receive up to $200 in Amazon gift cards in our end-of-year referral program.

November Updates

Coiled，发布于领英

赞评论分享
Coiled转发了

Patrick H?fler

Senior Software Engineer at Coiled, pandas Core Developer
1 天前
举报此动态
We've started working more on the Array Integration of Dask, especially for #Geospatial workloads. This yielded some promising results so far to make Dask faster and especially more scalable. I wrote a short blogpost explaining how an internal change in Dask that improves Data Selection operations has widespread impact on many of Xarrays methods https://lnkd.in/dXXsn3HN We are still interested in feedback about what isn't working well for you with Xarray and Dask. Please reach out if you have anything that bugs you!

Improving GroupBy.map with Dask and Xarray

xarray.dev

5 条评论

赞评论分享
Coiled转发了

Matthew Rocklin

Open Source Maintainer (Dask). Startup Founder (Coiled)
5 天前已编辑
举报此动态
New Post: SLURM-Style Job Arrays on the Cloud HPC Job scripts were the first form of parallelism I ever used as a graduate student. They're dead simple and accessible to almost anyone. Oddly, they're pretty hard to replicate on the cloud (AWS Batch/GCP Cloud Run/Azure Batch try, but aren't easy to use). We replicated the API with Coiled. It feels pretty slick to me ??

SLURM-Style Job Arrays on the Cloud with Coiled

docs.coiled.io

10 条评论

赞评论分享
Coiled

2,529 位关注者
1 周
举报此动态
Siemens Case Study: Data Processing with Airflow + Dask The data engineering?+ analytics team at Siemens often relies on SQL for manipulating large?datasets, but recently tackled a project that stretched beyond SQL: identifying trends in employee training records using a fuzzy algorithm. With Dask, the team reduced ETL runtime by 80%, cutting execution from over an hour to just 10 minutes. Traditional SQL works well for many tasks, but more complex use cases—like fuzzy matching and advanced aggregations—require the flexibility of Python. Scaling these Python workloads on large datasets, however, can be challenging. The team chose to use Coiled?+ Dask for a few reasons: ? Minimal code rewrites: Dask DataFrame's?similarity?to pandas made it easy to parallelize existing code. ? No need to manage cloud infrastructure: Coiled's?managed Dask clusters made it easy to deploy on the cloud. ? Integration with their current stack: Coiled?+ Dask integrated with their existing Airflow workflows, making it quick to get up and running. Learn more in the case study from Stephen Schneider and Franco Bosetti:?https://lnkd.in/gksgayaE

Airflow, Dask, & Coiled: Adding Big Data Processing to Your Cloud Toolkit

docs.coiled.io

1 条评论

赞评论分享
Coiled

2,529 位关注者
1 周
举报此动态
Lots of great talks at the Cloud-Native Geospatial Forum (CNG) virtual conference today! There's still time to register and learn more about open source geospatial tools like GeoParquet, Dask, and Icechunk.

James Bourbeau

Lead Open Source Engineer at Coiled
1 周

I'm excited to be speaking at #CloudNativeGeo2024 later today. Join me for "Building Large Scale Geospatial Benchmarks" at 1:25 PM CT. Looking forward to seeing folks there. Register here:?https://lnkd.in/gkWUkvcc

Virtual Conference 2024

cloudnativegeo.org

赞评论分享
Coiled转发了

Sarah Johnson

Product Marketing @ Coiled
2 周
举报此动态
Heading to my first PyData NYC tomorrow! Join us and hear more about how you can run Pandas on hundreds of GBs of data (or just gripe to Patrick H?fler about why your latest pandas PR hasn't been merged yet). We'll be at the Coiled booth all day Thursday and Friday, 5th floor right by the registration area and Quansight.

4 条评论

赞评论分享
Coiled

2,529 位关注者
3 周
举报此动态
We're often asked how Dask + Coiled fit into existing machine learning pipelines. Recently, we worked with Hugging Face to put together an example processing the FineWeb dataset using the ?? HF FineWeb-Edu classifier. Scaling to the full dataset (>200 million rows) was possible with Dask deployed on a multi-GPU Coiled cluster. Learn more about this workflow in our latest newsletter, plus other updates on geospatial benchmarks and upcoming events (looking forward to PyData NYC next week!)

October Updates

Coiled，发布于领英

赞评论分享
Coiled转发了

Sameer Soi

Data Scientist in the Life Sciences
1 个月已编辑
举报此动态
As we started the building out the LLM capabilities for finding activating antibodies for GPCRs at Abalone Bio we needed to go from local notebooks to code running on GPU clusters quickly, cost efficiently, and ideally without locking us into a cloud vendor. After evaluating vendors across the spectrum we chose Coiled as it seemed to check all the boxes. I am glad we did as they have proven to be amazing not just in the technical product they deliver but also in the high-touch, human support they provide. If you want to spend time shipping code and models and not thinking about k8s (eek!) all while working with great people I can easily recommend Coiled!

Abalone Bio

1,983 位关注者
1 个月

We have successfully integrated AI into our FAST platform, using powerful protein large language models to discover and design activating #antibodies for #GPCR targets. In our journey, Coiled has served as an indispensable partner enabling us to seamlessly scale from ideation to production. Check out this post by the Coiled team on how they have supported us: https://lnkd.in/gKxd7KNj

Abalone Bio: Accelerating Antibody Discovery

coiled.io

3 条评论

赞评论分享
Coiled转发了

Sarah Johnson

Product Marketing @ Coiled
1 个月
举报此动态
Using Hugging Face models and datasets is powerful for machine learning, but scaling tasks like model inference on large datasets can be challenging. Dask handles out-of-core computing, breaking up datasets into manageable chunks so that even large-scale tasks can run smoothly. In this example, we processed the FineWeb dataset (~715 GB in memory) using the ?? HF FineWeb-Edu classifier. Locally, processing 100 rows with pandas took ~10 seconds, but scaling up to 211M rows was possible with Dask on multi-GPU clusters deployed with Coiled. Results: - Handled large-scale text classification, filtering, and saved results to Hugging Face storage - Optimized GPU utilization to efficiently use expensive hardware This example could be adapted for other workflows like: - Genomic data filtering - Large-scale content extraction - Multimodal AI (audio, image, text) Had a lot of fun learning about Hugging Face and putting together this example with Quentin Lhoest, James Bourbeau, and Daniel van Strien Blog post: https://lnkd.in/gcV348fA
2 条评论

赞评论分享
Coiled转发了

PyCon DE & PyData

2,546 位关注者
1 个月已编辑
举报此动态
?? New video release ??: Pandas + Dask DataFrame 2.0 - Comparison to Spark, DuckDB and Polars Watch how #Dask DataFrame 2.0's improved performance and new features compare to #Spark, #DuckDB, and #Polars, offering a faster and more robust system for big data processing. ?? Watch the video on YouTube: https://lnkd.in/em9c2Qba Florian Jetter and Patrick H?fler discussed the significant enhancements to Dask, a Python library for distributed computing that integrates well with pandas. Historically, Dask was user-friendly but lacked robust performance. The re-implementation of the DataFrame API has addressed these concerns, making Dask faster and more efficient. Patrick Hoefler, a pandas core team member and Dask maintainer at Coiled, highlighted the improvements in Dask, including a new shuffle algorithm, a logical query planning layer, and a reduced memory footprint. These changes have led to a better user experience and a more robust system overall, especially when compared to tools like Spark, DuckDB, and Polars. The speakers emphasized the seamless integration of Dask with pandas and other PyData stack libraries, making it a compelling option for big data applications. They compared Dask's performance against other tools using TPC-H benchmarks. They also discussed future developments, including extending the logical query planning layer to frameworks like Dask Array and XArray.

Pandas + Dask DataFrame 2.0 - Comparison to Spark, DuckDB and Polars [PyCon DE & PyData Berlin 2024]

https://www.youtube.com/

1 条评论

赞评论分享

相似主页

查看职位

融资

Coiled 共 2 轮

上一轮

A 轮 2021年6月18日

US$21,000,000.00

投资者

Bessemer Venture Partners +2 其他投资者

在 Crunchbase 上查看更多信息

查看关于Coiled的洞察

Coiled

软件开发

New York，New York 2,529 位关注者

关于我们

地点

Coiled员工

Igor Taber ????

Founder and General Partner at Cortical Ventures - funding and incubating next AI leaders

Emily Anderson

VP Finance & Operations at Coiled

Nat Tabris

Staff Software Engineer at Coiled

David Chudzicki

Director of Product Engineering, Coiled

动态

Pandas + Dask DataFrame 2.0 - Comparison to Spark, DuckDB and Polars [PyCon DE & PyData Berlin 2024]

https://www.youtube.com/

立即加入，查看您错过的职场动态

相似主页

Outerbounds

Voltron Data

Fennel

Anyscale

Quansight

Prognos Health

:probabl.

Guac

Databricks

Anaconda, Inc.

查看职位

实习生职位

活动运营经理职位

软件实习生职位

工程师职位

分析主管职位

研究协调员职位

业务管理员职位

工程总监职位

助手职位

临床研究职位

经理职位

市场营销实习生职位

药学实习生职位

高管职位

解决方案工程师职位

支持工程师职位

顾问职位

暑期实习生职位

人力资源实习生职位

融资