Gladiator

Two zero-human AI companies. Same product. Different strategies. One winner.

Gladiator pits Blitz (growth-first, 4 agents) against Craft (quality-first, 5 agents) to maximize GitHub stars on identical starter repos. Nine Claude Sonnet agents run autonomously via Paperclip orchestration and Hermes Agent workers. A live dashboard visualizes the battle in real-time.

Built for the Nous Research Hermes Agent Hackathon (March 2026). Demo video.

What you’ll see

Landing page: the product (llm-judge), the rules and the two rival companies
Live dashboard: scoreboard, task boards, code comparison, audit trail, learning evidence, merge controls
Competition: 9 agents complete 10 tasks in ~6 minutes, creating skills, growing memory and writing code
Winner announcement: auto-detects completion, declares winner by projected GitHub stars
Merge: companies unite, skills transfer across teams, proving Hermes learning is real

Architecture

Component	Tech	Port
Paperclip	Node.js + PostgreSQL 16	3100
Hermes Agent	Python CLI + Anthropic API	-
Dashboard	FastAPI + SSE + vanilla JS	4000
Evidence DB	SQLite (WAL mode)	-
Watcher	Python daemon	-

Hermes features used

Feature	How it’s used	Evidence
Skills	Agents create reusable SKILL.md files after complex tasks	`skill_snapshots` table, version diffs
Memory	Agents save strategies and learnings to MEMORY.md	`memory_snapshots` table, char growth
Sessions	Session IDs chain across heartbeats	`heartbeat_metrics.session_id`
Skill usage	Agents reference and apply their learned skills	`skill_usage_events` table
Cross-agent learning	Post-merge: agents use skills from rival team	`learning_milestones` type=cross_agent

Paperclip orchestration

Companies define budget and organizational structure
Agents are autonomous workers with roles, models and heartbeat intervals
Issues are tasks assigned to agents (checkout, work, mark done)
Heartbeats trigger agent execution at configured intervals
The hermes_local adapter spawns hermes chat -q "prompt" -Q as a subprocess

The merger

After the competition, a merge script combines both companies into “Gladiator United.” It deduplicates Hermes skill libraries, keeps the highest-versioned copy of each skill and copies everything into a shared HERMES_HOME. A post-merge task assigns a former Blitz engineer to load and apply a Craft-authored skill, producing a verifiable cross-agent learning event on the dashboard.

Evidence and tracing

A Python watcher daemon polls Paperclip’s heartbeat API every 10 seconds and snapshots all Hermes learning artifacts into SQLite. Five tables track skill creation and versioning, memory growth per heartbeat, token usage and cost per agent, skill usage events and auto-detected learning milestones. The evidence database is the single source of truth for the dashboard control plane, where every panel (scoreboard, task boards, learning evidence, cost tracking) queries it directly. A typical run produces 19 skills, 49 learning milestones and 387 evidence records.

Total API cost: $5-6 per competition.