Skip to content

Shrew Parity Matrix

Source baseline: research/shrew/SPEC.md (Apr 2026), upstream small-model coding agent source-tree walk. Updated: wave-5 ship. Legend: GREEN = shipped / at parity (or superset); YELLOW = partial; RED = deferred or out of scope.

Trademark hygiene. Throughout this document the upstream project is referred to as “the small-model coding agent” or “the upstream”. Filesystem path mentions like ~/.shrew/skills/ are kept because they are facts about directories shrew can read on disk, not brand claims. See security-and-trademarks.md.

The upstream ships a single coding-agent CLI tuned for small local models, with three high-leverage moves: a curated skill set, MoE serving tricks, and a benchmark harness. Shrew mirrors all three on top of weasel.

Upstream surfaceShrew statusFileNotes
One-shot CLI (-p)GREENchimera/shrew/cli.pyInherits weasel print mode.
Interactive REPLGREENchimera/shrew/repl.pyInherits weasel REPL.
RPC mode (stdio)GREENchimera/shrew/cli.pyLate-binds to weasel’s RPC server.
SDKYELLOWn/aEmbed via from chimera.weasel.sdk import Agent; no shrew-specific SDK module yet.
--list-modelsGREENchimera/shrew/cli.pyProvider-driven catalogue.
Sessions list / showGREENchimera/shrew/sessions.pyReads ~/.chimera/eventlog/shrew-*/.
Curated skill setGREENchimera/shrew/skills/11 markdowns across knowledge / protocols / tools.
Small-model extensionsGREENchimera/shrew/extensions/moe_offload, scaffold_fit, tool_filter.
Local-first provider chainGREENchimera/shrew/providers.pyllama.cpp → Ollama → cloud.
Benchmark harnessGREENchimera/shrew/benchmarks/Aider Polyglot + GAIA wired.
Terminal-bench adapterREDn/aReserved by parser; not yet wired.

The upstream’s flag surface is small by design. Shrew mirrors it and adds the small-model-specific knobs (--vram-gb, --no-skills).

Upstream flagShrew statusShrew equivalentNotes
-p / --printGREEN-p / --printIdentical; inherits weasel.
--jsonGREEN--jsonSingle JSON blob on stdout.
--mode <m>GREEN--mode interactive|print|rpc|sdksdk is import-only.
--modelGREEN--modelSame syntax; default qwen3.6-35b-a3b.
--list-modelsGREEN--list-modelsProvider-driven.
--cwdGREEN--cwdSame.
--max-stepsGREEN--max-stepsDefault 30 (smaller than weasel’s).
--allowed-toolsGREEN--allowed-toolsDefault Read,Write,Edit,Bash.
--vram-gbGREEN--vram-gb / $SHREW_VRAM_GBDrives moe_offload.
--no-skillsGREEN--no-skillsSkip bundled skill set.
--no-colorGREEN--no-colorPlain output handler.
--verboseGREEN--verboseInherits weasel.
--resume <id>GREEN--resume <id>Inherits weasel.
--api-keyYELLOWenv vars preferredInline flag deferred for security.
--loginREDn/aOAuth flow deferred; chimera auth login covers it.

Shrew inverts weasel’s hosted-first ordering: a reachable local server wins over any cloud key. This is the single biggest divergence from weasel and a deliberate design choice — shrew exists to prove that small local models are good enough for real coding work.

Upstream providerShrew statusNotes
llama.cpp HTTPGREENDefault. Probed at $LLAMACPP_BASE_URL.
Ollama daemonGREENProbed at $OLLAMA_BASE_URL; routed via OpenAI-compat shim.
OpenRouter (vendor/name)GREENRouted via OpenAI-compat against openrouter.ai.
AnthropicGREENCloud fallback.
OpenAIGREENCloud fallback.
GoogleGREENInherits weasel.
Modal-hostedYELLOWProgrammatic only (no auto-detection).
CustomGREENregister_provider("name", factory).

Catalog (built-in model ids):

Model idBackendContextMoE?
qwen3.6-35b-a3bllama.cpp32kyes (35B / 3B-active)
qwen3.5-9bllama.cpp32kno (9.7B dense)
qwen3.5:cloudOllama262kno
qwen3.5Ollama32kno

Plus the MoEModelProfile catalog in chimera/shrew/extensions/moe_offload.py adds deepseek-coder-v2-lite-16b (16B / A2.4B MoE) for context-window sizing.

The upstream’s skill catalogue is the single biggest small-model adjustment. Shrew ships a curated subset.

Upstream skill categoryShrew statusFiles
knowledge/GREEN4 files: scaffold_model_fit, context_window_discipline, escalation_signals, python_idioms
protocols/GREEN4 files: edit_before_write, error_recovery, one_focused_question, test_first_python
tools/GREEN3 files: core_tools, grep_vs_ls, multi_file_edits
Per-language idiom skills (Rust, Go, TS)YELLOWPython-only today; pattern is in place to add others.
User overlay (~/.shrew/skills/)GREENHonored in discover_shrew_skills(extra_search_paths=...).
Project overlay (.shrew/skills/)GREENHonored when shrew is run from a project that ships them.
Frontmatter triggers fieldGREENParsed; not yet auto-fired (LLM is shown the index, not the trigger phrases).
Per-skill kill-switch flagRED--no-skills is all-or-nothing today.

The upstream’s small-model adjustments map to three named extensions in shrew. Each is independent and pure-functional.

Upstream conceptShrew statusFileNotes
MoE-aware context sizingGREENextensions/moe_offload.pyMoEModelProfile + compute_optimal_context_window().
Scaffold-model-fit prompt wrappingGREENextensions/scaffold_fit.pywrap_for_small_model(); threshold 13B.
Tool-list filtering for tiny modelsGREENextensions/tool_filter.pyfilter_tools_for_model(); threshold 9B.
--no-scaffold flagREDn/aDisable via running a frontier model; flag deferred.
Catalogue-driven model size lookupGREENmodel_size_billions() in tool_filter + MoE catalog.

The upstream ships a Python harness for several benchmarks. Shrew ports the two most useful for small-model coding.

Upstream benchmarkShrew statusFile
Aider PolyglotGREENbenchmarks/aider_polyglot.py
GAIAGREENbenchmarks/gaia.py
Terminal-benchREDReserved by parser; not yet wired.
Harbor / SWE-bench LiteREDOut of scope for the small-model focus.
Setup-hint on missing datasetGREENBoth adapters; exit code 3.
Per-language filter (polyglot)GREEN--language python etc.
Per-level filter (GAIA)GREEN--level 1 etc.
Built-in scorer (GAIA-style normalisation)GREENRe-implemented locally; stdlib-only.

Shrew reuses Chimera’s event-sourced session store directly.

CapabilityShrew statusNotes
~/.chimera/eventlog/shrew-<id>/ directory layoutGREENSame shape as weasel-, otter-, mink-, ferret-.
summary.json per runGREENInherits weasel.
event-*.json event streamGREENInherits weasel.
sessions listGREENchimera shrew sessions list.
sessions show <id>GREENchimera shrew sessions show <id>.
Session resumeGREEN--resume <id> (inherits weasel).
  • Surfaces: 9 GREEN, 1 YELLOW, 1 RED of 11.
  • CLI flags: 13 GREEN, 1 YELLOW, 1 RED of 15.
  • Providers: 6 GREEN, 1 YELLOW, 0 RED of 7.
  • Skills: 6 GREEN, 1 YELLOW, 1 RED of 8.
  • Extensions: 4 GREEN, 0 YELLOW, 1 RED of 5.
  • Benchmarks: 6 GREEN, 0 YELLOW, 2 RED of 8.
  • Sessions: 6 GREEN of 6.

Chimera-only capabilities (do not regress)

Section titled “Chimera-only capabilities (do not regress)”

Shrew inherits Chimera primitives the upstream does not have:

  • Cooperative CancellationToken (true mid-turn cancel).
  • MessageQueues for safe mid-turn steering.
  • Loop detection (exact + pattern cycle).
  • EventSourcedSession crash recovery + gap detection.
  • FileAwareCompaction (file tracking across compaction).
  • RedactionMiddleware for ten secret patterns.
  • CostTracker with cache + reasoning-token breakdown.
  • 26-event EventBus with middleware.
  • Multi-CLI sibling stack (mink, otter, ferret, weasel) shares the same eventlog format, so a shrew-* session can be inspected by the same chimera otter sessions show tooling.
  1. --no-scaffold flag for scaffold_fit opt-out.
  2. --skills <selector> for per-skill enable / disable.
  3. terminal-bench adapter under benchmarks/.
  4. Per-language idiom skills beyond Python (Rust, Go, TypeScript).
  5. Shrew-specific SDK module (chimera/shrew/sdk.py) so embedders get the small-model defaults without re-deriving them.
  6. Auto-fire skill bodies based on the frontmatter triggers field, instead of relying on the model recalling skills by name.

Shrew is a delta on weasel; if a behaviour isn’t called out here, it’s because it’s identical to weasel. Read docs/weasel/parity-matrix.md for the underlying surface and assume shrew inherits unless this page says otherwise.

GREEN rows are expected to behave in lockstep with the upstream small-model coding agent at the black-box level. YELLOW rows degrade gracefully and emit a hint where the gap is user-visible. RED rows are not implemented, by design or by deferral; the table makes the reason explicit.