What is Qwen-Agent framework? Inside the Qwen family
TuringPost
Newsletter about AI and ML. ?? Sign up for free to get your list of essential AI resources ??
we discuss the timeline of Qwen models, focusing on their agentic capabilities and how they compete with other models, and also explore what is Qwen-Agent framework and how you can use it
While everyone was talking about DeepSeek-R1’s milestone in models reasoning, Qwen models from Alibaba stayed overshadowed, despite they were cooking something interesting and also open-source. From the very beginning of their way they were focusing on making their best models capable of agentic features like tool use, which their earlier models also leveraged. Today we are going to discuss the entire journey of Qwen models to strong reasoning, matching or even being better than state-of-the-art models from OpenAI and DeepSeek. But it’s not all. As the AI and machine learning community is now more into ecosystems and complex frameworks, we’ll dive into Qwen-Agent framework – a full?fledged agentic ecosystem that lets Qwen models autonomously plan, call functions, and execute complex, multi?step tasks right out of the box. The Qwen family definitely deserve your attention, so let’s go!
In today’s episode, we will cover:
How it all began: Qwen 1.0 and Qwen 2
But first, a few words about Alibaba, to understand the scale. Alibaba Group, founded in 1999 by Jack Ma in Hangzhou, China, has grown into a global leader in e-commerce and technology. The company reported an 8% revenue increase to 280.2 billion yuan ($38.38 billion) for the quarter ending December 31, 2024, marking its fastest growth in over a year. The company's market capitalization stands at approximately $328.63 billion as of March 2025, positioning it among the world's most valuable companies. It's AI strategy is notable for its substantial investment and integration across its diverse business operations. The company has committed to investing over RMB 380 billion (approximately $53 billion) in AI and cloud computing infrastructure over the next three years, surpassing its total expenditure in these areas over the past decade.
And this where Qwen models start to play an important role.
From the very beginning of Qwen models’ development, we could witness how achieving strong agentic capabilities, including tool use and deep reasoning, shaped the strategy and advancements of Qwen models. Here is a brief timeline of the major Alibaba Cloud’s models, which leads us to what we have today from Qwen for agent development.
In mid-2023, Alibaba Cloud’s Qwen Team first open-sourced the family of LLMs, called Qwen 1.0. It included base LLMs with 1.8B, 7B, 14B, and 72B parameters, pretrained on up to 3 trillion tokens of multilingual data with the main focus on Chinese and English languages. Qwen 1.0 models featured context windows of up to 32K tokens and 8K for some early variants.
Alongside the base models, Alibaba released Qwen-Chat variants aligned via supervised fine-tuning and RLHF. Even at this early stage, it demonstrated a broad skillset – it could hold conversations, generate content, translate, code, solve math, and even use tools or act as an agent when appropriately prompted. So since their first models Qwen team designed their models with agentic behavior in mind and capable of effective tool-use.
In February 2024, the Qwen team announced an upgraded version called Qwen-1.5. This time they introduced uniform 32K context length support across all model sizes and expanded the model lineup to include 0.5B, 4B, 32B, and even a 110B parameter model. Not only its general skills, such as multilingual understanding, long-context reasoning and alignment improved, but also agentic capability jumped to the levels matching GPT4-level in a tool-use benchmark. At that time, it correctly selected and used tools with over 95% accuracy in many cases.
June 2024 brought Qwen 2 which inherited the Transformer-based architecture from the previous models and applied Grouped Query Attention (GQA) to all model sizes (compared to Qwen-1.5) for faster speed and less memory usage in model inference. It was a strong foundation for specialized tasks and later, in August 2024, Qwen2-Math, Qwen2-Audio (an audio-and-text model for understanding and summarizing audio inputs), and Qwen2-VL emerged.
Qwen2-VL was an important milestone. Like DeepSeek does with their models’ features, Qwen Team also finds its own technologies to make their models better. With Qwen2-VL Alibaba Cloud introduced their special innovations like naive dynamic resolution which allowed to process images of any resolution, dynamically converting them into a variable number of visual tokens. To better align positional information across all modalities (text, image, and video) it used Multimodal Rotary Position Embedding (MRoPE). Qwen2-VL can handle 20+ minute videos and can be integrated on devices, like phones and robots.
Qwen2.5 and QwQ-32B rivaling DeepSeek-R1