Tribunus · open source AI inference

Tribunus Desktop agent

Your AI coding agent. Runs locally. Plugin system. Multi-agent. Any model, any backend. Electron desktop app for macOS, Linux, Windows.

Agent system with tool execution
Project-aware sessions
Any LLM backend (local or remote)
Plugin system for custom tools

Learn More

Tribunus Compute engine

The inference engine inside Tribunus Desktop. Compile-time architecture — every decision frozen before runtime. Numerical oracle validates every kernel. Multi-backend from day one.

No guessing: compile-time candidate generation + 6-check admission
Numerical oracle: Apple Silicon FP32 as reference
Multi-backend: CUDA, Metal, Vulkan, ROCm, oneDNN, Level Zero, TT-NN
6 runtime pipelines: token intake → prefill → decode → KV → speculation → output

Learn More

For Engineers

Architecture decision records that define the Tribunus Compute engine — each one generated, verified, and source-backed.

▶ $ ADR 0037: Backend Realization Contract ✓ verified

Defines the formal contract each hardware backend must satisfy to be admitted into the Tribunus Compute runtime. Covers kernel registration, memory layout, stream synchronization, and the 6-check admission pipeline that gates every backend at compile time.

Full ADR Source

▶ $ ADR 0038: Numerical Governance + Autotuning ✓ verified

Establishes the numerical oracle — Apple Silicon FP32 as the golden reference — and the autotuning framework that validates every kernel against it. No kernel ships without passing the oracle's accuracy and performance gates.

Full ADR Source

▶ $ ADR 0039: Datacenter Control Plane ✓ verified

Specifies the control plane architecture for multi-node inference deployments: node discovery, health monitoring, load distribution, and the contract between the control plane and the per-node Compute runtime. Enables scale-out without sacrificing compile-time guarantees.

Full ADR Source

▶ $ ADR 0040: Runtime Inference Pipelines ✓ verified

Defines the six runtime pipelines — token intake, prefill, decode, KV cache management, speculative decoding, and output streaming — their scheduling guarantees, memory budgets, and the handoff contracts between adjacent stages. Each pipeline is a separate phase in the compile-time planner.

Full ADR Source

▶ $ ADR 0041: Server UX Strategy ✓ verified

Defines the user-facing inference server interface: REST API surface, streaming protocol (SSE + server-sent events), health and metrics endpoints, configuration model, and the development-server mode that mirrors production behavior without requiring a full cluster.

Full ADR Source

Your coding agent.Its own inference engine.Open source, verifiable.

Tribunus Desktop agent

Tribunus Compute engine

For Engineers

Your coding agent.
Its own inference engine.
Open source, verifiable.