Shrew Parity Matrix
chimera shrew Parity Matrix
Section titled “chimera shrew Parity Matrix”Source baseline: research/shrew/SPEC.md (Apr 2026), upstream
small-model coding agent source-tree walk.
Updated: wave-5 ship.
Legend: GREEN = shipped / at parity (or superset); YELLOW = partial; RED = deferred or out of scope.
Trademark hygiene. Throughout this document the upstream project is referred to as “the small-model coding agent” or “the upstream”. Filesystem path mentions like
~/.shrew/skills/are kept because they are facts about directories shrew can read on disk, not brand claims. Seesecurity-and-trademarks.md.
Top-level surfaces
Section titled “Top-level surfaces”The upstream ships a single coding-agent CLI tuned for small local models, with three high-leverage moves: a curated skill set, MoE serving tricks, and a benchmark harness. Shrew mirrors all three on top of weasel.
| Upstream surface | Shrew status | File | Notes |
|---|---|---|---|
One-shot CLI (-p) | GREEN | chimera/shrew/cli.py | Inherits weasel print mode. |
| Interactive REPL | GREEN | chimera/shrew/repl.py | Inherits weasel REPL. |
| RPC mode (stdio) | GREEN | chimera/shrew/cli.py | Late-binds to weasel’s RPC server. |
| SDK | YELLOW | n/a | Embed via from chimera.weasel.sdk import Agent; no shrew-specific SDK module yet. |
--list-models | GREEN | chimera/shrew/cli.py | Provider-driven catalogue. |
| Sessions list / show | GREEN | chimera/shrew/sessions.py | Reads ~/.chimera/eventlog/shrew-*/. |
| Curated skill set | GREEN | chimera/shrew/skills/ | 11 markdowns across knowledge / protocols / tools. |
| Small-model extensions | GREEN | chimera/shrew/extensions/ | moe_offload, scaffold_fit, tool_filter. |
| Local-first provider chain | GREEN | chimera/shrew/providers.py | llama.cpp → Ollama → cloud. |
| Benchmark harness | GREEN | chimera/shrew/benchmarks/ | Aider Polyglot + GAIA wired. |
| Terminal-bench adapter | RED | n/a | Reserved by parser; not yet wired. |
CLI flags
Section titled “CLI flags”The upstream’s flag surface is small by design. Shrew mirrors it
and adds the small-model-specific knobs (--vram-gb,
--no-skills).
| Upstream flag | Shrew status | Shrew equivalent | Notes |
|---|---|---|---|
-p / --print | GREEN | -p / --print | Identical; inherits weasel. |
--json | GREEN | --json | Single JSON blob on stdout. |
--mode <m> | GREEN | --mode interactive|print|rpc|sdk | sdk is import-only. |
--model | GREEN | --model | Same syntax; default qwen3.6-35b-a3b. |
--list-models | GREEN | --list-models | Provider-driven. |
--cwd | GREEN | --cwd | Same. |
--max-steps | GREEN | --max-steps | Default 30 (smaller than weasel’s). |
--allowed-tools | GREEN | --allowed-tools | Default Read,Write,Edit,Bash. |
--vram-gb | GREEN | --vram-gb / $SHREW_VRAM_GB | Drives moe_offload. |
--no-skills | GREEN | --no-skills | Skip bundled skill set. |
--no-color | GREEN | --no-color | Plain output handler. |
--verbose | GREEN | --verbose | Inherits weasel. |
--resume <id> | GREEN | --resume <id> | Inherits weasel. |
--api-key | YELLOW | env vars preferred | Inline flag deferred for security. |
--login | RED | n/a | OAuth flow deferred; chimera auth login covers it. |
Provider chain
Section titled “Provider chain”Shrew inverts weasel’s hosted-first ordering: a reachable local server wins over any cloud key. This is the single biggest divergence from weasel and a deliberate design choice — shrew exists to prove that small local models are good enough for real coding work.
| Upstream provider | Shrew status | Notes |
|---|---|---|
| llama.cpp HTTP | GREEN | Default. Probed at $LLAMACPP_BASE_URL. |
| Ollama daemon | GREEN | Probed at $OLLAMA_BASE_URL; routed via OpenAI-compat shim. |
OpenRouter (vendor/name) | GREEN | Routed via OpenAI-compat against openrouter.ai. |
| Anthropic | GREEN | Cloud fallback. |
| OpenAI | GREEN | Cloud fallback. |
| GREEN | Inherits weasel. | |
| Modal-hosted | YELLOW | Programmatic only (no auto-detection). |
| Custom | GREEN | register_provider("name", factory). |
Catalog (built-in model ids):
| Model id | Backend | Context | MoE? |
|---|---|---|---|
qwen3.6-35b-a3b | llama.cpp | 32k | yes (35B / 3B-active) |
qwen3.5-9b | llama.cpp | 32k | no (9.7B dense) |
qwen3.5:cloud | Ollama | 262k | no |
qwen3.5 | Ollama | 32k | no |
Plus the MoEModelProfile catalog in
chimera/shrew/extensions/moe_offload.py adds
deepseek-coder-v2-lite-16b (16B / A2.4B MoE) for context-window
sizing.
Skills
Section titled “Skills”The upstream’s skill catalogue is the single biggest small-model adjustment. Shrew ships a curated subset.
| Upstream skill category | Shrew status | Files |
|---|---|---|
knowledge/ | GREEN | 4 files: scaffold_model_fit, context_window_discipline, escalation_signals, python_idioms |
protocols/ | GREEN | 4 files: edit_before_write, error_recovery, one_focused_question, test_first_python |
tools/ | GREEN | 3 files: core_tools, grep_vs_ls, multi_file_edits |
| Per-language idiom skills (Rust, Go, TS) | YELLOW | Python-only today; pattern is in place to add others. |
User overlay (~/.shrew/skills/) | GREEN | Honored in discover_shrew_skills(extra_search_paths=...). |
Project overlay (.shrew/skills/) | GREEN | Honored when shrew is run from a project that ships them. |
Frontmatter triggers field | GREEN | Parsed; not yet auto-fired (LLM is shown the index, not the trigger phrases). |
| Per-skill kill-switch flag | RED | --no-skills is all-or-nothing today. |
Extensions (small-model adjustments)
Section titled “Extensions (small-model adjustments)”The upstream’s small-model adjustments map to three named extensions in shrew. Each is independent and pure-functional.
| Upstream concept | Shrew status | File | Notes |
|---|---|---|---|
| MoE-aware context sizing | GREEN | extensions/moe_offload.py | MoEModelProfile + compute_optimal_context_window(). |
| Scaffold-model-fit prompt wrapping | GREEN | extensions/scaffold_fit.py | wrap_for_small_model(); threshold 13B. |
| Tool-list filtering for tiny models | GREEN | extensions/tool_filter.py | filter_tools_for_model(); threshold 9B. |
--no-scaffold flag | RED | n/a | Disable via running a frontier model; flag deferred. |
| Catalogue-driven model size lookup | GREEN | model_size_billions() in tool_filter + MoE catalog. |
Benchmarks
Section titled “Benchmarks”The upstream ships a Python harness for several benchmarks. Shrew ports the two most useful for small-model coding.
| Upstream benchmark | Shrew status | File |
|---|---|---|
| Aider Polyglot | GREEN | benchmarks/aider_polyglot.py |
| GAIA | GREEN | benchmarks/gaia.py |
| Terminal-bench | RED | Reserved by parser; not yet wired. |
| Harbor / SWE-bench Lite | RED | Out of scope for the small-model focus. |
| Setup-hint on missing dataset | GREEN | Both adapters; exit code 3. |
| Per-language filter (polyglot) | GREEN | --language python etc. |
| Per-level filter (GAIA) | GREEN | --level 1 etc. |
| Built-in scorer (GAIA-style normalisation) | GREEN | Re-implemented locally; stdlib-only. |
Sessions / eventlog
Section titled “Sessions / eventlog”Shrew reuses Chimera’s event-sourced session store directly.
| Capability | Shrew status | Notes |
|---|---|---|
~/.chimera/eventlog/shrew-<id>/ directory layout | GREEN | Same shape as weasel-, otter-, mink-, ferret-. |
summary.json per run | GREEN | Inherits weasel. |
event-*.json event stream | GREEN | Inherits weasel. |
sessions list | GREEN | chimera shrew sessions list. |
sessions show <id> | GREEN | chimera shrew sessions show <id>. |
| Session resume | GREEN | --resume <id> (inherits weasel). |
Counts
Section titled “Counts”- Surfaces: 9 GREEN, 1 YELLOW, 1 RED of 11.
- CLI flags: 13 GREEN, 1 YELLOW, 1 RED of 15.
- Providers: 6 GREEN, 1 YELLOW, 0 RED of 7.
- Skills: 6 GREEN, 1 YELLOW, 1 RED of 8.
- Extensions: 4 GREEN, 0 YELLOW, 1 RED of 5.
- Benchmarks: 6 GREEN, 0 YELLOW, 2 RED of 8.
- Sessions: 6 GREEN of 6.
Chimera-only capabilities (do not regress)
Section titled “Chimera-only capabilities (do not regress)”Shrew inherits Chimera primitives the upstream does not have:
- Cooperative
CancellationToken(true mid-turn cancel). MessageQueuesfor safe mid-turn steering.- Loop detection (exact + pattern cycle).
EventSourcedSessioncrash recovery + gap detection.FileAwareCompaction(file tracking across compaction).RedactionMiddlewarefor ten secret patterns.CostTrackerwith cache + reasoning-token breakdown.- 26-event
EventBuswith middleware. - Multi-CLI sibling stack (mink, otter, ferret, weasel) shares the
same eventlog format, so a
shrew-*session can be inspected by the samechimera otter sessions showtooling.
Follow-up issues to file
Section titled “Follow-up issues to file”--no-scaffoldflag forscaffold_fitopt-out.--skills <selector>for per-skill enable / disable.- terminal-bench adapter under
benchmarks/. - Per-language idiom skills beyond Python (Rust, Go, TypeScript).
- Shrew-specific SDK module (
chimera/shrew/sdk.py) so embedders get the small-model defaults without re-deriving them. - Auto-fire skill bodies based on the frontmatter
triggersfield, instead of relying on the model recalling skills by name.
How to use this matrix
Section titled “How to use this matrix”Shrew is a delta on weasel; if a behaviour isn’t called out here,
it’s because it’s identical to weasel. Read
docs/weasel/parity-matrix.md for
the underlying surface and assume shrew inherits unless this page
says otherwise.
GREEN rows are expected to behave in lockstep with the upstream small-model coding agent at the black-box level. YELLOW rows degrade gracefully and emit a hint where the gap is user-visible. RED rows are not implemented, by design or by deferral; the table makes the reason explicit.
See also
Section titled “See also”quickstart.md— five-minute tour.small-model-setup.md— llama.cpp + GGUF + MoE serving incantation.skills.md— bundled skill catalogue.extensions.md—moe_offload,scaffold_fit,tool_filter.benchmarks.md— Aider Polyglot + GAIA.security-and-trademarks.md— trademark hygiene + security posture.docs/weasel/parity-matrix.md— the underlying weasel surface shrew layers on.