October Updates
Scaling model inference with Hugging Face + Dask. We processed the 715 GB FineWeb dataset using the Hugging Face FineWeb-Edu classifier. Locally, processing 100 rows with pandas took ~10 seconds, but scaling up to 211M rows was possible with Dask on multi-GPU clusters. Learn more in our blog post.
A first pass at large geospatial benchmarks. Last month we solicited a call for 1-100 TiB scale geospatial workloads to form a benchmark suite for large-scale geoscience analytics workloads. Learn more about what we’ve implemented so far (spoiler alert: most things break) in our blog post.
Accelerating antibody discovery. Abalone Bio generates millions of data points on the functional activity of antibodies, but was only able to analyze a fraction of this data with traditional methods. They now use Coiled to train their protein LLMs at scale on GPU-accelerated hardware, vastly improving their hit rate. Learn more in our case study.
Events
Next Dask Demo Day on November 7th. Join us for live community demos and updates from Dask maintainers. Have something you’d like to share? Let us know.
PyData NYC. Join us next week in New York Nov. 7-8th. Stop by the booth and say hi, we’ll be on the 5th floor by the registration area.
Just signed up for Coiled?
Watch our short demo to see how to get started. It’s easy to connect Coiled to your AWS, GCP, or Azure account.
New to Dask?
Check out our tutorials on using Dask DataFrames, parallelizing your Python code, plus more advanced use cases.