Poe API
MiMo-V2-Omni
This model will be depreciated on May 31, 2026. Please switch to: https://poe.com/MiMo-V2.5
MiMo V2 Omni is Xiaomi’s omni-modal foundation model that natively sees, hears, and understands text, images, audio, and video within a single unified architecture. It combines deep reasoning and web search with agent-like capabilities—such as multi-step planning and tool use—to seamlessly power complex, real-world workflows.
Notes:
- Context Window: 256K
- Max Output: 128K
- Supported input modalities: Text, Image, Audio, Video
Input Limits
- Images: JPEG, PNG, GIF, WebP, BMP (max 10 MB each). Multiple images supported.
- Audio: MP3, WAV, FLAC, M4A, OGG (max 100 MB each). Multiple files supported.
- Video: MP4, MOV, AVI, WMV (max 300 MB each). Multiple videos supported.
This bot supports optional parameters for additional customization.
Powered by a server managed by @empiriolabsai. Learn more
Build with MiMo-V2-Omni using the Poe API
Start by creating an API key, for use with any bot on Poe:
See the full documentation for comprehensive guidance on getting started.
More from EmpirioLabs AI
Qwen3.7-Max
Qwen3.5-4B-EL
Qwen3.5-9B-EL
Gemma-4-26B-A4B-EL
DeepSeek-V4-Pro-E
DeepSeek-V4-Flash-E
DeepSeek-V4-Pro-EL
DeepSeek-V4-Flash-EL
HappyHorse-1.0-EL
MiMo-V2.5