Alibaba's flagship LLM pre-trained on 18 trillion tokens, matching Llama-3-405B performance (5x smaller), excelling in knowledge, reasoning, math, and coding benchmarks.
Alibaba's code-specialized model trained on 5.5T tokens supporting 92 programming languages, achieving 85% on HumanEval and matching GPT-4o on code repair tasks.
Alibaba's reasoning model matching DeepSeek-R1 (671B) performance with only 32B parameters, beating OpenAI o1-preview on AIME/MATH tests, requiring just 24GB VRAM.