Xiaomi just open-sourced a 1T-parameter model and almost nobody noticed

Xiaomi released MiMo-V2.5-Pro under an MIT license a few days ago, and the response has been quietly enthusiastic on r/LocalLLaMA but barely registered on other places like Hacker News. The phone-manufacturer-makes-LLM angle keeps tripping people up. MiMo-V2.5-Pro is a Mixture-of-Experts model with 1.02 trillion total parameters and 42 billion active per token, and it landed at 54 on the Artificial Analysis Intelligence Index – squarely in frontier territory. On reddit, u/lendo93 reported that in their benchmark suite the model averages higher than Opus 4.6 on coding reasoning, agentic work, and decision making.About the modelThe architecture is built around two ideas… First, hybrid attention: 60 of 70 layers use sliding-window attention with a window of 128 tokens, while only 10 layers run global attention, in a 6:1 SWA-to-GA ratio. This cuts KV-cache storage by roughly 7x compared to a standard transformer, and it’s how Xiaomi gets a usable 1M-token context window without the cache exploding.Second, multi-token prediction. There are three lightweight MTP modules with dense FFNs that predict ahead of the main token stream, and Xiaomi reports this triples inference output speed. The MTP modules are trained natively rather than bolted on as speculative decoding, so the speedup compounds with the long-context handling.AIModels.fyi is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Related Posts

What LG and NVIDIA’s talks reveal about the future of physical AI

Agentic Marketing Platform for Enterprises Valued at $2.75B

People Are Selling Kills of Marathon’s Hardest Boss on eBay