Skip to content

AI Limos/Isima

Access Point

Services

  • Chat - Chat with language models
  • RAG - Retrieval-Augmented Generation for querying your documents
  • API - Programmatic access to language models

Current Models

dev-model: Focused on development tasks and agentic workflows general: General-purpose model for classic chat tasks with reasoning general_nothink: Variant of general without reasoning capabilities

Model Tokens Params Active Aliases Capabilities
MiniMax-M2.7 196608 230B 10B dev-model Chat, Agentic
Mistral-Small4 262144 119B 6.5B general, general_nothink Chat, Agentic, VL (img)
bge-m3 8192 1B embedding Embedding

Hardware

  • 4x H100 (90GB RAM each) = 360GB total
  • 1x H200 (140GB RAM)

Changelogs

v1 (20/04/2026)

  • URL changes
  • OpenWebUI reset
  • Reset of all LiteLLM keys
  • Opening to teachers/researchers
  • Deployment of Ragondin (rag.ia.limos.fr)
  • Redeployment of various services

v0.5 (13/04/2026)

  • Update to MiniMax M2.7
  • Setup of Searxng proxy

v0.4 (31/03/2026)

  • API calls now go through LiteLLM
  • OpenWebUI connection with SSO
  • Token management from https://keymgr.limos.fr
  • Mistral Small4 replaces Qwen3.5
  • Addition of general-nothink alias that disables reasoning
  • BAAI/bge-m3 embedding model

Update API_URL in your configs to https://litellm.limos.fr/v1 Token generation at https://keymgr.limos.fr

v0.3 (25/02/2026)

  • Model changes: MiniMax (dev) and Qwen3.5 (general)
  • Qwen3.5 natively supports VL (image, video, audio, screenshot)
  • Export of VLLM metrics via Prometheus
  • Resource usage graphs in Grafana

v0.2 (12/02/2026)

  • Model changes: DevStral replaced by GLM-4.7
  • Addition of a CLI to control models
  • API key management via LiteLLM

v0.1 (06/01/2026)

  • Provision of OpenWebUI using LiteLLM
  • Web search via SearxNG (self-hosted, separate service)
  • API key generation via OpenWebUI
  • Available models: DevStral-123b, gpt-oss-120b