The Best Open Source AI Developer Tools of 2024

The Best Open Source AI Developer Tools of 2024

Hello Everyone,

This article is going to be a quick introduction to Open-Source AI developer tools.

On this Newsletter we have quite a few software developers who are building with LLM open-source models and tools.


?? From our sponsor: ??


Bring your AI to every Mac app, using the app's context

Omnipilot brings AI to every Mac app, using the app's context to provide intelligent assistance. You can invoke it with a shortcut to supercharge writing, email, and getting answers.

Try it Now


For just $2 a week to get access to my best deep dives.

Subscribe now

With the arrival of so many impressive Open-source models of late including open-weight models finally closing the gap with GPT-4 on LMSYS' Chabot Arena, such as Cohere’s Command-R, it’s time to take a look with the research and analyses of Alex Irina Sandu specifically on AI developer tools.

Her Newsletter: The Strategy Deck

In Pursuit of Competitive Advantage

By Alex Irina Sandu

The Strategy Deck is known for its visuals, infographics and stand-out market research. Highly recommend her services as a consultant and researchers as well, contact her here.

Product and Corporate Strategy Expert

I’m really a fan of her ability to visualize complex business scenarios including in studying competitive landscapes of startups and in product. I depend on her market research and infographics often as a guest contributor like in this post.

Contact Alex

My Approach to Native Sponsors

As my Newsletter evolves I’m trying to choose consciously the sort of builders I want to associate with. This includes giving early stage startups and solo builders steep discounts. My latest Native Sponsor is a small new builder that I’m especially bullish about, it's an AI copilot that autocompletes text EVERYWHERE on MacOS. In every app!

I haven’t seen many copilots for MacOS.


Thanks to Alex for this article. You can read her bio at the end of the article. Or contact her on LinkedIn.

Articles to Revisit

  1. Market Map: Gen AI Companies with Foundational Models
  2. How BigTech Invested in AI Companies in 2023
  3. An Overview of Google’s AI Product Strategy


By Alex Irina Sandu of The Strategy Deck

This is a collection of some of the most popular open source AI and ML developer tools, ranked by the number of stars they have on GitHub, for projects active in 2023 and 2024. It focuses on developer applications used to train and deploy ML models and AI agents and its purpose is to highlight the breadth and diversity of tools and frameworks that are being built by the open source AI community and the vast potential in the space.?


The collection contains:

  • Model training and inference frameworks:
  • Vector databases - Milvus, Faiss
  • Orchestration tools - LangChain, LlamaIndex, Flowise, Mindsdb
  • Compute Optimization libraries - Colossal AI, DeepSpeed, Ray, Vllm


Editor’s Note: Context: Read Chip Huyen’s long blog.


Read Huge Survey


20 Popular Open Source AI Developer Tools

  1. TensorFlow is a widely used library for training and inference of ML models that offers significant versatility and scalability across platforms, from consumer hardware to clusters of servers. Open sourced in November 2015 by Google, it has since collected over 182,000 stars on GitHub.
  2. Hugging Face Transformers is a library that offers access and tools to fine-tune and deploy a vast collection of the open source models that are hosted on Hugging Face. It is compatible with TensorFlor, Pytorch and JAX and?includes support for text, image and audio models. Launched in 2018, Transformers has gathered 124,000 stars on GitHub.
  3. LangChain is a framework for developing and deploying applications powered by large language models developed by the company with the same name. The tool provides modular building blocks and components for building custom chains, it allows inspection, monitoring and evaluation of applications and can turn any chain into a REST API. Launched in October 2022, LangChain has achieved over 81,000 stars on GitHub.?
  4. PyTorch is a very popular General Purpose framework and library developed by Facebook’s AI Research lab and released in October 2016, with over 77,000 stars on GitHub.
  5. GPT4ALL is a client that can install and run AI models on consumer-grade and edge hardware provided by Nomic AI. Optimized for CPU-only, no-internet environments, GPT4All runs on Windows, OSX and Ubuntu and can run a series of models, including Alpaca, Llama, Pythia, Mosaic, Falcon, StablLM and custom GPT4All ones. Launched in August 2023, it gained over 63,000 stars on GitHub.?
  6. Ollama is a tool which enables users to use open source LLMs locally on their Windows, macOS or Linux machine. With support for a variety of models, including Gemma, Llama, Mistral, Mixtral, Command-R and Llava, the tool gathered over 52,000 stars on GitHub since its launch in February 2023. Besides the GitHub repository, resources for Ollama are available on the website and Discord channel.
  7. ColossalAI provides a collection of parallelism components for distributed training and inference of models with a few lines of code. Their tools support data, pipeline, tensor and sequence parallelism, as well as a zero redundancy optimizer and a method for automatic management of parallelization. Since its creation in 2021, the Colossal AI repository has gained over 37,000 stars on GitHub.???
  8. DeepSpeed, provided by Microsoft, is a deep learning optimization library for distributed training and inference. It is developed to support large to very large models that need to be trained at scale on hundreds and thousands of GPUs in resource constrained systems, and yet still deliver low latency and high throughput for inference. Since its launch in May 2020, it has gathered over 32,000 stars on GitHub.?
  9. LlamaIndex is a data framework for LLM-based applications with Retrieval Augmented Generation. The tool provides the abstractions necessary to more easily ingest, structure, and access private or domain-specific data in order to inject them into LLMs. Its components include Data connectors to get existing data from their native source and format, data indexes to structure it in intermediate representations and engines to provide natural language access to the data. Since it launched in Nov 2022, Llamaindex collected over 30,000 stars on GitHub.?
  10. Ray is an unified compute framework for scaling AI and Python workloads? — from reinforcement learning to deep learning to tuning, and model serving. It consists of a core distributed runtime and a set of AI libraries for simplifying ML compute, including ones that deal with datasets, distributed training, hyperparameter tuning and inference. Developed by Anyscale, Ray has over 30,000 stars on GitHub.
  11. Milvus is a vector database built to power embedding similarity search and AI applications. It is used to make unstructured data search more accessible regardless of the deployment environment. Developed by The Linux Foundation AI & Data organization, Milvus has gathered over 36,000 stars on GitHub.?
  12. Faiss (Facebook AI Similarity Search) is a library for efficient similarity search and clustering of dense vectors developed by Meta AI. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning. Since its launch in March 2017, it collected over 27,000 stars on GitHub.
  13. Autogen is a multi-agent conversation framework that works as a high-level abstraction for building workflows using multiple LLMs.? Developed by Microsoft, it is meant to support the development of modular and complex agents to perform sophisticated tasks that use AI models, as well as a variety of other tools and components. Since its launch in September 2023, Autogen collected over 24,000 stars on GitHub.
  14. Hugging Face Diffusers is a library for fine-tuning and deployment of pretrained diffusion models for generating images, audio, and 3D objects. It includes diffusion pipelines for inference, interchangeable noise schedulers for different diffusion speeds and output quality and access to pretrained models from the Hugging Face platform. Diffusers has over 22,000 stars on GitHub.
  15. Flowise is a low-code, drag and drop tool to develop customized LLM orchestration flows and AI agents. Built around customization and modularity, the framework supports integrations with other frameworks, the creation of autonomous agents for various tasks and integration or open source, locally run LLMs. Since its launch in February 2023, Flowise has collected over 21,000 stars on GitHub.
  16. Mindsdb is a platform which automates pipelines that connect real-time enterprise data to AI systems. It is used to train and customize models, automate tasks, define and execute trigger events and provide observability.? Since it launched in 2017, it has gathered over 21,000 stars on GitHub.?
  17. Semantic Kernel from Microsoft is an SDK that lets developers build AI agents that can call on existing code. It lets them mix conventional programming languages, like C# and Python, with LLMs using prompt templating, chaining, and planning capabilities in order to build AI experiences into existing applications. Released in March 2023, Semantic Kernel has gathered over 17,000 stars on GitHub.?
  18. Vllm is a high-throughput and memory-efficient inference engine for LLMs that uses PagedAttention, an algorithm for the management of attention keys and values. Developed at UC Berkeley, it was launched in September 2023 with the paper “Efficient Memory Management for Large Language Model Serving with PagedAttention” and it has since collected over 17,000 stars on GitHub.?
  19. Machine Learning Compilation for Large Language Models (MLC LLM) is a universal deployment solution that allows native deployment of any large language models with native APIs with compiler acceleration. The mission of the project is to enable everyone to develop, optimize and deploy AI models natively on everyone’s devices with ML compilation techniques. Developed by researchers at Carnegie Mellon University, MLC LM has over 16,000 stars on GitHub.
  20. The Unity Machine Learning Agents Toolkit enables games and simulations to serve as environments for training intelligent agents. It provides PyTorch-based algorithms to enable game developers to train intelligent agents for 2D, 3D and VR/AR games. The? agents can be used for multiple purposes, including controlling NPC behavior (in a variety of settings such as multi-agent and adversarial), automated testing of game builds and evaluating different game design decisions pre-release. The toolkit gathered over 16,000 stars on GitHub.?


Author’s Bio

Alex Sandu, the author of this guest post, a popular contributor to AI Supremacy, and the writer of The Strategy Deck, a newsletter focused on AI market analysis, is seeking a new role.

  • With expertise in building global consumer, developer and open source tech products, Alex brings 15 years of experience in cross-functional management roles, including Technical Product Management and Corporate Strategy.?
  • Alex is a seasoned expert in building features from concept to launch for hundreds of millions of users worldwide, driving product strategy through data and insights, and managing strategic operations and annual planning for large organizations.
  • An experienced Product and Strategy Manager, Alex excels in aligning product vision with customer and market requirements, building competitive differentiation for your company and driving product development and impact in market.

To talk to Alex about how she can be a valuable addition to your team, reach out at alex [at] TheStrategyDeck [dot] com, or on LinkedIn.


In March, 2024 Chip Huyen wrote a blog post titled: “What I learned from looking at 900 most popular open source AI tools”.

The AI Stack Has Exploded in the Early 2020s

In 2023, the layers that saw the highest increases were the applications and application development layers.

  1. Coding
  2. Bots
  3. Info Aggregation
  4. Image Production
  5. Workflow Automation



2023 was the Year AI Grew Up


Some Helpful Newsletters


AIEdge

A newsletter for continuous learning about Machine Learning applications, Machine Learning System Design, MLOps, the latest techniques and news. Subscribe and receive a free Machine Learning book PDF!

By Damien Benveniste

Ahead of AI

Ahead AI specializes in Machine Learning & AI research and is read by tens of thousands of researchers and practitioners who want to stay ahead in the ever-evolving field.

By Sebastian Raschka, PhD

AI Tidbits

Stay ahead on the latest in AI through weekly summaries and editorial deep dives providing unique perspectives on recent developments

By Sahar Mor

Supervised

Covering innovation and emerging technology in big data and AI.

By Matthew Lynley

AI on a Budget

Weekly news, tips, and tutorials on fine-tuning, running, and serving large language models on your computer. Each tutorial is published along with a notebook ready to run.

By Benjamin Marie


Latent Space

The AI Engineer newsletter + Top 10 US Tech podcast. Exploring AI UX, Agents, Devtools, Infra, Open Source Models. See https://latent.space/about for highlights from Chris Lattner, Andrej Karpathy, George Hotz, Simon Willison, Emad Mostaque, et al!

By swyx & Alessio

You can discover more AI Newsletters on this curated list.


Explore More


LaSalle Browne

Quantum Thinker, Precision based personalization - Data + Systems + People, Biohacker, Traveler, Learning enthusiast, Reader, Sports & Fitness Lover

5 个月

Thanks for sharing Michael Spencer. Valuable resource for anyone interested in AI.

回复
Dr. Badre Belabbess

?? Serial Entrepreneur | ?? AI Leader & Expert | ???? EU AI Commission Advisor | ?? Empowering Businesses through AI Solutions

6 个月

Thanks for sharing these valuable resources for AI developers! The breakdown of tools is insightful, and the Copilot for Mac OS looks promising. Great post!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了