Step 1
Baseline route — Claude Sonnet 4.6 (control)
Expected behaviour: agents_used should be ['executor'] only. No planner, no verifier. Useful as the control.
seed: plan.recipe.1
I°
Loading...
v2.0 — Self-improving AI
Open Aya uses a CAISI-inspired evaluation framework to measure capability, cost, latency, auditability, and workflow lift across baseline models, the Aya Pipeline, and reasoner routes. The goal is not to claim AGI; the goal is to prove whether an AI operating layer completes organizational work better than fragmented AI tools.
28 integrated apps. Voice-native, vision-aware, local-first with optional cloud sync. Multi-agent routing across planner, executor, memory, verifier, and critic strategies — every step auditable through a public benchmark harness, not a brochure.
Open Aya OS
Live system facts. Every number on this card is generated at request time from the runtime registry or the public eval database — there is no separate marketing source to drift.
Model layer
Agent layer
6 routed strategies: planner, executor, memory_retriever, verifier, router, self_critic
Strategy-Auction routing implemented as system-prompt routing rules
Memory layer
Supabase + browser IndexedDB (local-first)
Kinds: short-term turn cache · long-term Auto-Dream consolidation · GraphRAG knowledge edges
Tool layer
6 built-in tools across 28 apps
Web search · Code execution (Code Lab) · File store (Spatial Files) · Calendar / Notes / Word Processor · …
Local-first status
Yes — runs in-browser; data stays on device by default
Cloud sync status
Optional — Supabase auth + persistence when signed in
Apps in registry
28
Generated from lib/app-registry.ts
Routed agents
6
Strategy-auction policies, system-prompt routed
Eval score (avg)
—
Across 0 completed runs
Last eval run
no runs yet
UTC server time
Avg latency / task
—
Wall-clock, includes network hop
Audit mode
Public — every eval result writes a reasoning trace to /api/aya/inspect and aggregates to /api/aya/audit
A/B comparison — pass rate by route
baseline
—
Claude Sonnet 4.6, no spine (control)
aya_pipeline
—
Claude Sonnet 4.6 + 7-stage cognitive spine
aya_reasoner
—
Claude Opus 4.6, extended thinking (10k)
Claude Sonnet 4.6, Aya's 7-stage cognitive pipeline on anthropic/claude-sonnet-4.6, and the anthropic/claude-opus-4.6 reasoner (extended thinking, 10k budget) across all completed runs.Claude Sonnet 4.6 baseline running the same conversation tier without the cognitive spine — so the A/B delta isolates the wrapping, not a model upgrade.Demo · Multi-agent routing
The hardest question in any wrapper-over-LLM product is does the wrapping actually help? This demo answers it the only honest way: by running the same task through the raw baseline and through Aya's pipeline, and showing the receipts side by side.
For each card below, set the route selector at the top to match the card's caption, then click Run. The receipt at the bottom of each card shows agents_used, latency_ms, cost_estimate, and confidence.
Step 1
Expected behaviour: agents_used should be ['executor'] only. No planner, no verifier. Useful as the control.
seed: plan.recipe.1
Step 2
Expected behaviour: agents_used should include planner, executor, verifier (and possibly memory_retriever). Latency higher; confidence and pass-rate should be higher too.
seed: plan.recipe.1
Step 3
Expected behaviour: Includes self_critic. Highest cost, typically highest quality. Use this when accuracy matters more than latency.
seed: plan.recipe.1