Step 1
Stash a fact
Expected behaviour: Aya should acknowledge the fact and the trace should show a memory write stage.
Please remember this for me: my favorite color is teal, and my dog's name is Pixel. Acknowledge that you've stored it.
I°
Loading...
v2.0 — Self-improving AI
Open Aya uses a CAISI-inspired evaluation framework to measure capability, cost, latency, auditability, and workflow lift across baseline models, the Aya Pipeline, and reasoner routes. The goal is not to claim AGI; the goal is to prove whether an AI operating layer completes organizational work better than fragmented AI tools.
28 integrated apps. Voice-native, vision-aware, local-first with optional cloud sync. Multi-agent routing across planner, executor, memory, verifier, and critic strategies — every step auditable through a public benchmark harness, not a brochure.
Open Aya OS
Live system facts. Every number on this card is generated at request time from the runtime registry or the public eval database — there is no separate marketing source to drift.
Model layer
Agent layer
6 routed strategies: planner, executor, memory_retriever, verifier, router, self_critic
Strategy-Auction routing implemented as system-prompt routing rules
Memory layer
Supabase + browser IndexedDB (local-first)
Kinds: short-term turn cache · long-term Auto-Dream consolidation · GraphRAG knowledge edges
Tool layer
6 built-in tools across 28 apps
Web search · Code execution (Code Lab) · File store (Spatial Files) · Calendar / Notes / Word Processor · …
Local-first status
Yes — runs in-browser; data stays on device by default
Cloud sync status
Optional — Supabase auth + persistence when signed in
Apps in registry
28
Generated from lib/app-registry.ts
Routed agents
6
Strategy-auction policies, system-prompt routed
Eval score (avg)
—
Across 0 completed runs
Last eval run
no runs yet
UTC server time
Avg latency / task
—
Wall-clock, includes network hop
Audit mode
Public — every eval result writes a reasoning trace to /api/aya/inspect and aggregates to /api/aya/audit
A/B comparison — pass rate by route
baseline
—
Claude Sonnet 4.6, no spine (control)
aya_pipeline
—
Claude Sonnet 4.6 + 7-stage cognitive spine
aya_reasoner
—
Claude Opus 4.6, extended thinking (10k)
Claude Sonnet 4.6, Aya's 7-stage cognitive pipeline on anthropic/claude-sonnet-4.6, and the anthropic/claude-opus-4.6 reasoner (extended thinking, 10k budget) across all completed runs.Claude Sonnet 4.6 baseline running the same conversation tier without the cognitive spine — so the A/B delta isolates the wrapping, not a model upgrade.Demo · Memory
The public eval endpoint is stateless within a single request, so memory continuity in this demo is shown via the reasoning trace: which stages handled the stash, which stages fired during recall, and whether memory_used flipped to true.
For the full session-persistent memory experience, sign in to the OS and use the Memory app — that uses Auto-Dream consolidation against your private Supabase row, RLS-scoped.
Step 1
Expected behaviour: Aya should acknowledge the fact and the trace should show a memory write stage.
Please remember this for me: my favorite color is teal, and my dog's name is Pixel. Acknowledge that you've stored it.
Step 2
Expected behaviour: A direct answer. The memory_retriever stage should NOT fire (this turn has no recall need).
What is the capital of France? Answer in one word.
Step 3
Expected behaviour: Aya should retrieve 'teal' and 'Pixel' verbatim. The memory_retriever stage should fire and the receipt's memory_used flag should be true.
Without re-reading earlier messages, what is my favorite color and what is my dog's name?
Step 4
Expected behaviour: Curated, reproducible memory test from the seed corpus. Same shape as above, with a published expected answer for scoring.
seed: mem.recall.1