AI.news
主页教程研究工具模型AI创业讨论新闻每日简报WIKI🚀 创业库★ 投稿
AI+医疗机器人教育金融能源健康娱乐思考

Mantis

Mantis LLM Gateway

Mantis is an open-source, self-hosted LLM gateway for teams building applications across multiple model targets. It gives client applications one stable chat-completions API while centralizing routing policy, failover behavior, response caching, guardrails, observability, and AWS deployment configuration.

The project is designed for small teams that want the benefits of an LLM gateway without giving up control of their infrastructure or data.

What Mantis Provides

  • One API for LLM calls: send chat-completion requests through a single gateway endpoint instead of integrating directly with each provider.
  • Configurable routing: route by metadata, model aliases, weighted targets, fallback chains, retries, timeouts, and cooldowns.
  • Response caching: reduce repeated LLM calls with exact prompt caching and optional semantic caching.
  • Guardrails: use AWS Bedrock guardrails to mask sensitive data and block policy-violating prompts or responses.
  • Observability: capture request IDs, latency, token usage, cache behavior, errors, and request outcomes through CloudWatch.
  • AWS-native deployment: provision and run Mantis with Terraform, ECS Fargate, ALB, ElastiCache, Parameter Store, S3, IAM, and CloudWatch.

Repositories

  • llm-gateway: the FastAPI gateway service, React configuration dashboard, Terraform infrastructure, and deployment scripts.
  • mantis-sdk: a Python SDK for calling the Mantis /v1/chat/completions endpoint from application code.
  • mantis-llm-gateway.github.io: the public documentation site and case study.

Start Here

  • Read the documentation for the project overview, guides, API reference, and architecture case study.
  • Follow the quick start to run or deploy the gateway.
  • Review the routing configuration guide to understand how model selection, fallback, caching, and cooldown behavior are controlled.

Project Goals

Mantis exists to make multi-LLM application development more reliable, observable, and operationally manageable. Instead of spreading provider-specific logic across application code, teams can put model routing, cache policy, failover behavior, guardrails, and deployment concerns behind one gateway layer.

The result is a system where application code stays simple, model choices remain configurable, and teams keep control over how requests move through their own AWS environment.

Show HN: Mantis, A self-hosted LLM gateway | AI.News