Qwen Releases QwQ-32B, a 32B-Parameter Reasoning Model

Qwen has announced the release of QwQ-32B, a new reasoning model with 32 billion parameters. Despite its relatively small size, the model achieves performance comparable to larger models like DeepSeek-R1.

Scaling Reinforcement Learning

QwQ-32B is built on Qwen2.5-32B and explores new strategies for scaling reinforcement learning (RL). The team found that RL training significantly improves reasoning, particularly in math and coding tasks. Continuous RL scaling allows the model to compete with larger mixture-of-experts (MoE) models.

The model integrates agent-like capabilities, enabling it to use tools and adapt its reasoning based on feedback. These advancements highlight the potential of RL in developing more efficient and intelligent language models.

Performance and Benchmarks

QwQ-32B has been evaluated across multiple benchmarks, assessing its abilities in math, coding, and general problem-solving. The model performs competitively against other leading models, including DeepSeek-R1-Distilled-Qwen-32B and DeepSeek-R1-Distilled-Llama-70B.

Reinforcement Learning Approach

The RL training for QwQ-32B starts with a cold-start checkpoint and a reward-driven scaling method. Initially, RL is applied to math and coding tasks, using accuracy verification for math solutions and code execution servers to test generated code.

In the second stage, RL is expanded to general capabilities, incorporating a general reward model and rule-based verifiers. This phase enhances instruction following, human preference alignment, and agent-like reasoning without compromising performance in math and coding.

Open Access and Availability

QwQ-32B is available as an open-weight model under the Apache 2.0 license. It can be accessed on Hugging Face and ModelScope. A demo is also available on Hugging Face Spaces.

Users can try the model through Qwen Chat and provide feedback on its performance.

Archives

Categories

Qwen Releases QwQ-32B, a 32B-Parameter Reasoning Model

Scaling Reinforcement Learning

Performance and Benchmarks

Reinforcement Learning Approach

Open Access and Availability

Yandex updates Maps with walkable routes

Yandex Updates Report Wizard With New Groupings, Keyword Support, And Charts

China Court Ruling: AI Not A Legal Excuse To Fire

Baidu Launches International Wiki, Gains Massive Visibility in Google