Two zero-human AI companies. Same product. Different strategies. One winner.
Gladiator pits Blitz (growth-first, 4 agents) against Craft (quality-first, 5 agents) to maximize GitHub stars on identical starter repos. Nine Claude Sonnet agents run autonomously via Paperclip orchestration and Hermes Agent workers. A live dashboard visualizes the battle in real-time.
Built for the Nous Research Hermes Agent Hackathon (March 2026). Demo video.
What you’ll see
- Landing page: the product (llm-judge), the rules and the two rival companies
- Live dashboard: scoreboard, task boards, code comparison, audit trail, learning evidence, merge controls
- Competition: 9 agents complete 10 tasks in ~6 minutes, creating skills, growing memory and writing code
- Winner announcement: auto-detects completion, declares winner by projected GitHub stars
- Merge: companies unite, skills transfer across teams, proving Hermes learning is real
Architecture
| Component | Tech | Port |
|---|---|---|
| Paperclip | Node.js + PostgreSQL 16 | 3100 |
| Hermes Agent | Python CLI + Anthropic API | - |
| Dashboard | FastAPI + SSE + vanilla JS | 4000 |
| Evidence DB | SQLite (WAL mode) | - |
| Watcher | Python daemon | - |
Hermes features used
| Feature | How it’s used | Evidence |
|---|---|---|
| Skills | Agents create reusable SKILL.md files after complex tasks | skill_snapshots table, version diffs |
| Memory | Agents save strategies and learnings to MEMORY.md | memory_snapshots table, char growth |
| Sessions | Session IDs chain across heartbeats | heartbeat_metrics.session_id |
| Skill usage | Agents reference and apply their learned skills | skill_usage_events table |
| Cross-agent learning | Post-merge: agents use skills from rival team | learning_milestones type=cross_agent |
Paperclip orchestration
- Companies define budget and organizational structure
- Agents are autonomous workers with roles, models and heartbeat intervals
- Issues are tasks assigned to agents (checkout, work, mark done)
- Heartbeats trigger agent execution at configured intervals
- The hermes_local adapter spawns
hermes chat -q "prompt" -Qas a subprocess
The merger
After the competition, a merge script combines both companies into “Gladiator United.” It deduplicates Hermes skill libraries, keeps the highest-versioned copy of each skill and copies everything into a shared HERMES_HOME. A post-merge task assigns a former Blitz engineer to load and apply a Craft-authored skill, producing a verifiable cross-agent learning event on the dashboard.
Evidence and tracing
A Python watcher daemon polls Paperclip’s heartbeat API every 10 seconds and snapshots all Hermes learning artifacts into SQLite. Five tables track skill creation and versioning, memory growth per heartbeat, token usage and cost per agent, skill usage events and auto-detected learning milestones. The evidence database is the single source of truth for the dashboard control plane, where every panel (scoreboard, task boards, learning evidence, cost tracking) queries it directly. A typical run produces 19 skills, 49 learning milestones and 387 evidence records.
Total API cost: $5-6 per competition.