AI Limos/Isima✯

Access Point✯

URL: https://ia.limos.fr

Services✯

Chat - Chat with language models
RAG - Retrieval-Augmented Generation for querying your documents
API - Programmatic access to language models

Current Models✯

dev-model: Focused on development tasks and agentic workflows general: General-purpose model for classic chat tasks with reasoning general_nothink: Variant of general without reasoning capabilities

Model	Tokens	Params	Active	Aliases	Capabilities
MiniMax-M2.7	196608	230B	10B	dev-model	Chat, Agentic
Mistral-Small4	262144	119B	6.5B	general, general_nothink	Chat, Agentic, VL (img)
bge-m3	8192	1B		embedding	Embedding

Hardware✯

4x H100 (90GB RAM each) = 360GB total
1x H200 (140GB RAM)

Changelogs✯

v1 (20/04/2026)✯

URL changes
OpenWebUI reset
Reset of all LiteLLM keys
Opening to teachers/researchers
Deployment of Ragondin (rag.ia.limos.fr)
Redeployment of various services

v0.5 (13/04/2026)✯

Update to MiniMax M2.7
Setup of Searxng proxy

v0.4 (31/03/2026)✯

API calls now go through LiteLLM
OpenWebUI connection with SSO
Token management from https://keymgr.limos.fr
Mistral Small4 replaces Qwen3.5
Addition of general-nothink alias that disables reasoning
BAAI/bge-m3 embedding model

Update API_URL in your configs to https://litellm.limos.fr/v1 Token generation at https://keymgr.limos.fr

v0.3 (25/02/2026)✯

Model changes: MiniMax (dev) and Qwen3.5 (general)
Qwen3.5 natively supports VL (image, video, audio, screenshot)
Export of VLLM metrics via Prometheus
Resource usage graphs in Grafana

v0.2 (12/02/2026)✯

Model changes: DevStral replaced by GLM-4.7
Addition of a CLI to control models
API key management via LiteLLM

v0.1 (06/01/2026)✯

Provision of OpenWebUI using LiteLLM
Web search via SearxNG (self-hosted, separate service)
API key generation via OpenWebUI
Available models: DevStral-123b, gpt-oss-120b