Coiled

Coiled

软件开发

New York,New York 2,529 位关注者

关于我们

Python, but big. Churn through a ton of data, no cloud expertise needed.

网站
https://coiled.io
所属行业
软件开发
规模
11-50 人
总部
New York,New York
类型
私人持股
创立
2020

地点

Coiled员工

动态

  • Coiled转发了

    查看Patrick H?fler的档案,图片

    Senior Software Engineer at Coiled, pandas Core Developer

    We've started working more on the Array Integration of Dask, especially for #Geospatial workloads. This yielded some promising results so far to make Dask faster and especially more scalable. I wrote a short blogpost explaining how an internal change in Dask that improves Data Selection operations has widespread impact on many of Xarrays methods https://lnkd.in/dXXsn3HN We are still interested in feedback about what isn't working well for you with Xarray and Dask. Please reach out if you have anything that bugs you!

    Improving GroupBy.map with Dask and Xarray

    Improving GroupBy.map with Dask and Xarray

    xarray.dev

  • Coiled转发了

    查看Matthew Rocklin的档案,图片

    Open Source Maintainer (Dask). Startup Founder (Coiled)

    New Post: SLURM-Style Job Arrays on the Cloud HPC Job scripts were the first form of parallelism I ever used as a graduate student. They're dead simple and accessible to almost anyone. Oddly, they're pretty hard to replicate on the cloud (AWS Batch/GCP Cloud Run/Azure Batch try, but aren't easy to use). We replicated the API with Coiled. It feels pretty slick to me ??

    SLURM-Style Job Arrays on the Cloud with Coiled

    SLURM-Style Job Arrays on the Cloud with Coiled

    docs.coiled.io

  • 查看Coiled的公司主页,图片

    2,529 位关注者

    Siemens Case Study: Data Processing with Airflow + Dask The data engineering?+ analytics team at Siemens often relies on SQL for manipulating large?datasets, but recently tackled a project that stretched beyond SQL: identifying trends in employee training records using a fuzzy algorithm. With Dask, the team reduced ETL runtime by 80%, cutting execution from over an hour to just 10 minutes. Traditional SQL works well for many tasks, but more complex use cases—like fuzzy matching and advanced aggregations—require the flexibility of Python. Scaling these Python workloads on large datasets, however, can be challenging. The team chose to use Coiled?+ Dask for a few reasons: ? Minimal code rewrites: Dask DataFrame's?similarity?to pandas made it easy to parallelize existing code. ? No need to manage cloud infrastructure: Coiled's?managed Dask clusters made it easy to deploy on the cloud. ? Integration with their current stack: Coiled?+ Dask integrated with their existing Airflow workflows, making it quick to get up and running. Learn more in the case study from Stephen Schneider and Franco Bosetti:?https://lnkd.in/gksgayaE

    Airflow, Dask, & Coiled: Adding Big Data Processing to Your Cloud Toolkit

    Airflow, Dask, & Coiled: Adding Big Data Processing to Your Cloud Toolkit

    docs.coiled.io

  • 查看Coiled的公司主页,图片

    2,529 位关注者

    Lots of great talks at the Cloud-Native Geospatial Forum (CNG) virtual conference today! There's still time to register and learn more about open source geospatial tools like GeoParquet, Dask, and Icechunk.

  • 查看Coiled的公司主页,图片

    2,529 位关注者

    We're often asked how Dask + Coiled fit into existing machine learning pipelines. Recently, we worked with Hugging Face to put together an example processing the FineWeb dataset using the ?? HF FineWeb-Edu classifier. Scaling to the full dataset (>200 million rows) was possible with Dask deployed on a multi-GPU Coiled cluster. Learn more about this workflow in our latest newsletter, plus other updates on geospatial benchmarks and upcoming events (looking forward to PyData NYC next week!)

    October Updates

    October Updates

    Coiled,发布于领英

  • Coiled转发了

    查看Sameer Soi的档案,图片

    Data Scientist in the Life Sciences

    As we started the building out the LLM capabilities for finding activating antibodies for GPCRs at Abalone Bio we needed to go from local notebooks to code running on GPU clusters quickly, cost efficiently, and ideally without locking us into a cloud vendor. After evaluating vendors across the spectrum we chose Coiled as it seemed to check all the boxes. I am glad we did as they have proven to be amazing not just in the technical product they deliver but also in the high-touch, human support they provide. If you want to spend time shipping code and models and not thinking about k8s (eek!) all while working with great people I can easily recommend Coiled!

  • Coiled转发了

    查看Sarah Johnson的档案,图片

    Product Marketing @ Coiled

    Using Hugging Face models and datasets is powerful for machine learning, but scaling tasks like model inference on large datasets can be challenging. Dask handles out-of-core computing, breaking up datasets into manageable chunks so that even large-scale tasks can run smoothly. In this example, we processed the FineWeb dataset (~715 GB in memory) using the ?? HF FineWeb-Edu classifier. Locally, processing 100 rows with pandas took ~10 seconds, but scaling up to 211M rows was possible with Dask on multi-GPU clusters deployed with Coiled. Results: - Handled large-scale text classification, filtering, and saved results to Hugging Face storage - Optimized GPU utilization to efficiently use expensive hardware This example could be adapted for other workflows like: - Genomic data filtering - Large-scale content extraction - Multimodal AI (audio, image, text) Had a lot of fun learning about Hugging Face and putting together this example with Quentin Lhoest, James Bourbeau, and Daniel van Strien Blog post: https://lnkd.in/gcV348fA

    • 该图片无替代文字
  • Coiled转发了

    查看PyCon DE & PyData的公司主页,图片

    2,546 位关注者

    ?? New video release ??: Pandas + Dask DataFrame 2.0 - Comparison to Spark, DuckDB and Polars Watch how #Dask DataFrame 2.0's improved performance and new features compare to #Spark, #DuckDB, and #Polars, offering a faster and more robust system for big data processing. ?? Watch the video on YouTube: https://lnkd.in/em9c2Qba Florian Jetter and Patrick H?fler discussed the significant enhancements to Dask, a Python library for distributed computing that integrates well with pandas. Historically, Dask was user-friendly but lacked robust performance. The re-implementation of the DataFrame API has addressed these concerns, making Dask faster and more efficient. Patrick Hoefler, a pandas core team member and Dask maintainer at Coiled, highlighted the improvements in Dask, including a new shuffle algorithm, a logical query planning layer, and a reduced memory footprint. These changes have led to a better user experience and a more robust system overall, especially when compared to tools like Spark, DuckDB, and Polars. The speakers emphasized the seamless integration of Dask with pandas and other PyData stack libraries, making it a compelling option for big data applications. They compared Dask's performance against other tools using TPC-H benchmarks. They also discussed future developments, including extending the logical query planning layer to frameworks like Dask Array and XArray.

相似主页

查看职位

融资