Command the Cohort.

Coordinate hundreds of coding agents in a single repository. Real-time visibility, scoped authority, zero chaos.

Download for macOS

See It in Action.

Orchestrate

Define missions, assign scope, spawn cohorts. One engineer → hundreds of agents.

Observe

Every diff, every decision, every agent status streams live. Nothing happens in the dark.

Govern

Scoped authority, merge gates, evidence chains. Senior engineers set the perimeter.

The Software Factory, Reimagined.

Orchestrate Agents

Spawn and coordinate hundreds of coding agents working concurrently across your entire codebase. Define missions, assign scope, and watch the cohort execute.

Real-Time Visibility

Every file change, every agent decision, every diff streams live to every team member. Nothing happens in the dark.

Scoped Authority

Senior engineers set the perimeter. Juniors and rookies operate within explicit guardrails. Authority flows down — never sideways.

Evidence & Gates

Every change is tracked, every decision is auditable, every merge is gated. Cryptographic evidence chains for regulated environments.

From Prototype to Production.

Prototyping

Spin up working prototypes in minutes. Define the scaffold and let agents wire the plumbing while you focus on architecture.

Debugging

Deploy agent squads to trace through unfamiliar code, surface root causes, and propose fixes — all in parallel.

Fast Iteration

Ship features in hours, not sprints. Agents handle the boilerplate; your team reviews the intent and ships the result.

Rapid Feature Implementation

Translate requirements into working code at pace. Break down features into parallel agent tasks with automatic conflict resolution.

Cybersecurity Auditing

Turn agents loose on your codebase to find vulnerabilities, check compliance, and produce audit-ready reports.

Ethical Hacking Workflows

Run structured penetration testing campaigns with agents that report findings to a central evidence ledger.

Full Production Delivery

From greenfield to production. Agents draft, test, document, and package. Humans review and approve each gate.

New

Local Inference. Zero Round Trips.

Tribunus isn't just a control plane. It ships a native inference runtime that executes models locally on Apple Silicon — no cloud, no API keys, no latency tax.

48 Layers Qualified
0 Application Copies
38 Tests Across 9 Modules
FP16 Boundary Precision

MLX + Core ML Hybrid

Attention runs on MLX. MLP layers offload to the Apple Neural Engine through Core ML — with zero-copy IOSurface tensor handoff between runtimes.

Arena Allocation

IOSurface-backed FP16 arena pools with lease semantics. No malloc in the hot path. Every tensor transition is receipted and auditable.

Deterministic Decoding

Greedy decode with exact token match verification. Eight-token qualification suite. Same seed, same input — same output. Every time.

KV Cache Engine

Sliding window and global eviction policies. Prefill + cached decode parity. Concurrent access with proper lease isolation.

What the Compute Engine Actually Does.

Forget the jargon. Here is what happens when you press enter — in plain English, with the actual data from our research runs.

1

The Model Lives on Your Disk

A frozen copy of Gemma 4 12B sits in local storage. Tribunus maps it into memory once — no downloads, no API keys, no cloud. The model is identified by its cryptographic hash so you know exactly what you are running.

Model hash d042df1e…
2
+

Two Engines Fire Up at Once

MLX takes the GPU and runs the attention layers — the part of the model that decides which words matter. Core ML takes the Apple Neural Engine and runs the feed-forward layers — the part that processes what was decided. They share memory through something called IOSurface. Think of it like two chefs sharing one cutting board instead of passing ingredients back and forth.

Copies between engines 0
3

42 Stages, Every Microsecond Tracked

From the moment the worker wakes up to the moment the final token streams out, every operation is timestamped and receipted. Want to know exactly how long the attention softmax took on layer 23 of the 48-layer stack? It is in the event journal. This is not logging — it is forensic instrumentation. If something is slow, you know exactly what and exactly how much.

Pipeline stages 42
Layers tracked 48
Events per run 2,000+
4

Same Input → Same Output. Every Time.

Deterministic greedy decoding means the model produces the exact same tokens for the exact same input. We verify this with an eight-token qualification suite — the runtime must match the oracle reference token-for-token. If a single token is wrong, the run is rejected. This is how you know your optimizations did not break anything.

Qualification requirement 8/8 tokens match
5

Every Claim Has a Receipt

No performance claim is made without a run manifest, a provenance chain, and an immutable observation record. Five experiments have been designed (EXP-0000 through EXP-0004), each with formal hypotheses, repetition policies, and correctness gates. The data lives in DuckDB — queryable, auditable, and permanently archived. Rejected optimizations are never deleted so nobody repeats the same dead end.

Experiments designed 5
Workloads defined 6
Run grades 4 tiers

This is the foundation. The control plane coordinates agents. The compute engine runs models. The evidence plane proves they both work. Three pillars, one platform.

Performance Claims Need Proof.

Every performance claim is backed by the Research Evidence Plane — a provenance-tracked experiment framework with 42 instrumented pipeline stages, cryptographic artifact hashing, and immutable run records.

42 pipeline stages — instrumented, timestamped, receipted. Every microsecond accounted for.

Run Grades

Every run is graded: exploratory, controlled, claim candidate, or archival. Only controlled+ runs support optimization decisions. Legacy results are marked not authoritative.

Immutable Observations

Raw observations are never modified in place. Corrections produce a new copy with provenance link to the original. Negative results are preserved forever — preventing repeated dead ends.

Provenance Chain

Every run captures source commit, binary hashes, model identity, machine profile, and environment. DuckDB analytics over the normalized event stream. Nothing is taken on faith.

CLAIM-0000
Tribunus local inference runtime v1 physically and semantically qualified for one frozen Gemma 4 12B ComputeImage on the qualified Apple Silicon 16 GB profile.

Status: Draft · Scope: 64-token prompt ceiling, 8-token output ceiling, deterministic greedy decoding · Model: d042df1e…

The Road Ahead.

Phase 1 — Now

macOS Developer Preview

macOS native app, single developer orchestrating hundreds of agents, real-time change visibility, scoped authority, LocalFabric coordination backend.

Phase 2 — Next

Team Collaboration

Real-time team collaboration, peer-to-peer serverless architecture, senior/junior/rookie sandboxing, Valkey coordination backend for live multi-user sessions.

Phase 3 — Later

Enterprise Governance

Enterprise governance, cryptographic audit trails, policy-gated merges, compliance-ready evidence rings, remote-Valkey for distributed teams.

Compute — Now

Native Inference Runtime

MLX + Core ML hybrid execution, IOSurface FP16 arena allocation, deterministic decoding, KV cache engine. 48-layer qualification on Gemma 4 12B. Application-copy-free tensor handoff.

Compute — Next

Evidence Plane Qualification

Full claim-candidate runs across all workloads. Core ML state qualification. Tokio streaming with cancellation. MLP placement benchmarks. Frozen dataset release for archival-grade claims.

How Tribunus Was Born.

It started as an OpenCode fork. Not because OpenCode was bad — because the little frictions add up. I started by writing my own agent profiles. Then my own custom tools to make agents more effective. Then my own coordination scripts. The agents were reading and writing handoff logs in ad-hoc JSON files scattered across disk. I built what was essentially a database out of a pile of scripts on the filesystem. Every day, another friction. Another workaround. Another script that almost worked.

Eventually I forked the experimental branch of OpenCode on GitHub and started looking inside. I needed to find the root of all the little annoyances that were driving me insane. What I found was that the architecture of a single-agent tool simply cannot be retrofitted into a multi-agent system. The coordination layer has to be native. So I started building.

I yanked SQLite and transplanted PGlite + DuckDB + Valkey to build a high-throughput coordination layer. I turned my custom agents into actual state machines. Each spawned subagent became its own mini runtime with conditional state machines — forcing agents to actually follow the orchestration path I designed, consistently. I enabled shared worktrees and background agents so the main agent became a dispatcher that summons team leads, each coordinating a different portion of the development process. I deprecated the TUI-first approach and made every capability a first-class implementation in both the agent operation UX and the user-facing UI.

I built this with love, tears, and frustration. I firmly believe that leaving an agent to roam unsupervised into any system with shell access is the engineering equivalent of leaving a toddler with a loaded nailgun unsupervised. Useful tool, wrong supervision model. But the right supervision model didn't exist. So I had to build my own. I'm sharing it here so you don't have to experience that kind of pain either.

Today, a single developer can orchestrate and see in real time the changes being made to the codebase by hundreds of concurrent agents. The macOS app (Apple Silicon and Intel) lets you spawn agent cohorts, define missions, set scope boundaries, and watch every diff stream live. Nothing happens in the dark.

The end goal is first-class real-time collaboration where teams of software engineers work together, each spawning hundreds of concurrent agents, all in the same repository at the same time. The real-time coordination is so smooth that a senior engineer can see what juniors and rookies are doing in real time. Teams should spend more time building the product than screaming at each other in meetings about someone deleting the production database because they sent the wrong prompt.

From the Workbench.

The 42-Stage Pipeline: Why We Instrument Every Microsecond

From worker_launch to worker_teardown — the taxonomy behind Tribunus inference performance measurement.

Coming soon

Zero-Copy Tensors Across MLX and Core ML

How IOSurface-backed FP16 arenas eliminate application-level copies at the hybrid compute boundary.

Coming soon

Negative Results Are Immortal

Why the Evidence Plane preserves rejected optimizations forever — and how that saves engineering years.

Coming soon
Roadmap

Where This Is Going.

The next phase of Tribunus is peer-to-peer serverless team coordination. No central server, no cloud dependency, no single point of failure. Every engineer will see every agent's work in real time — senior engineers watching junior cohorts, rookies learning from how the system operates under constraint.

Senior engineers will define mission parameters, set review gates, and control which agents have access to which parts of the codebase. Junior engineers will operate within sandboxes — learning by observing how the cohort executes under constraint. The days of screaming at each other in meetings about someone deleting the production database because they sent the wrong prompt will be over.

The result will be a software factory where every contributor will work at the top of their capability, no one will be blocked waiting for review, and every merge will meet the bar.

Frequently Asked.

What is Tribunus?

Tribunus is a control plane for agentic engineering. It coordinates hundreds of coding agents working concurrently in a single repository with real-time visibility and scoped authority.

Is this just another AI coding tool?

No. Tribunus does not generate code. It coordinates the agents that do. It handles the orchestration, visibility, gate enforcement, and evidence tracking so you don't have to manage a hundred agent sessions manually.

Do I need to run my own LLM?

No. Tribunus works with whatever LLM backend you already use — OpenAI, Anthropic, local models via Ollama, or enterprise providers. The control plane is model-agnostic.

Is this free?

The macOS developer preview is free. Pricing for team and enterprise tiers will be announced when those features ship. There will always be a generous free tier for individual developers.

Can agents break my codebase?

Agents operate within scoped authority defined by you. They can only write to files you explicitly allow. Senior engineers set the perimeter. Every change is visible in real time. Every merge is gated.

How is this different from running multiple Claude Code instances?

Running multiple agents without coordination means merge conflicts, race conditions, duplicated work, and zero visibility into what each agent is doing. Tribunus provides the coordination fabric that makes multi-agent development safe, visible, and productive.

Get in Touch.

Email hello@tribunus.dev

GitHub github.com/juliantorr-es/opencode

Built by a Venezuelan engineer in San Francisco, California — with love, tears, and frustration.

Get Tribunus.

macOS Apple Silicon

Native ARM64 binary. M1/M2/M3/M4 and later.

v0.1.0 — Developer Preview Download

macOS Intel

x86-64 binary for Intel-based Macs.

Coming Soon — Intel Build