Claude log format
Claude Logs Structure and Processing Documentation
Section titled “Claude Logs Structure and Processing Documentation”This document describes Claude log files (JSONL format), their structure, and how StackUnderflow processes them to generate analytics while handling complex edge cases.
Table of Contents
Section titled “Table of Contents”- Log File Structure
- Entry Types and Fields
- Message Formats
- Tool Usage
- Special Cases
- Processing Pipeline
- Deduplication and Tool Counting
- Storage
- Legacy Format
- Known Issues and Solutions
Log File Structure
Section titled “Log File Structure”File Location
Section titled “File Location”Modern Claude Code (January 2026 and later) writes one JSONL file per session, organised by project:
~/.claude/projects/{project-path-slug}/{session-id}.jsonlThe slug is the absolute project path with path separators replaced by hyphens:
/Users/example/.claude/projects/-Users-example-dev-myproject/08fce8c2-8453-42da-a52c-e03472c24e0f.jsonlClaudeAdapter.enumerate() walks ~/.claude/projects/, yields a SessionRef for every .jsonl file it finds, and falls back to ~/.claude/history.jsonl for project directories that predate the per-project format (see Legacy Format).
Important: Multiple Sessions Per File
Section titled “Important: Multiple Sessions Per File”While JSONL files are named after a primary session ID, they can contain log entries from multiple sessions:
- Conversation Continuation: When a conversation is continued after compaction or restart
- Cross-Session References: When Claude references work from another session
- Session Merging: When multiple related sessions are logged together
Best Practice: The adapter reads the sessionId field from each JSONL line and stores it on the Record. Filename stems are used as a fallback only when sessionId is absent.
Entry Types and Fields
Section titled “Entry Types and Fields”Entry Types
Section titled “Entry Types”summary— Session or conversation summaryuser— User messages (includes tool results)assistant— Claude’s responses
Important: The root type field indicates the log entry type, NOT necessarily the message role.
Common Fields
Section titled “Common Fields”All Entries
Section titled “All Entries”type(string): Type of the entrytimestamp(ISO 8601): When the entry was createduuid(string): Unique identifier for this entry
User/Assistant Entries
Section titled “User/Assistant Entries”sessionId(string): Session identifierparentUuid(string|null): UUID of the parent messageisSidechain(boolean): Whether this is a side conversation (e.g., Task tool)userType(string): Type of user (e.g., “external”)cwd(string): Current working directoryversion(string): Claude versionmessage(object): The actual message content
Assistant-Specific Fields
Section titled “Assistant-Specific Fields”requestId(string): API request identifiermessage.id(string): Unique message ID (important for streaming)
User-Specific Fields
Section titled “User-Specific Fields”toolUseResult(object|string): Detailed tool execution resultsisCompactSummary(boolean): True for conversation summaries
Message Formats
Section titled “Message Formats”User Messages
Section titled “User Messages”{ "type": "user", "message": { "role": "user", "content": [ { "type": "text", "text": "User's message text" }, { "type": "tool_result", "tool_use_id": "tool_id", "content": "Tool execution result" } ] }}Assistant Messages
Section titled “Assistant Messages”{ "type": "assistant", "message": { "id": "msg_id", "type": "message", "role": "assistant", "model": "claude-opus-4-20250514", "content": [ { "type": "text", "text": "Claude's response text" }, { "type": "tool_use", "id": "toolu_xxxxx", "name": "ToolName", "input": { "parameter": "value" } } ], "stop_reason": "tool_use", "usage": { "input_tokens": 1234, "output_tokens": 567, "cache_creation_input_tokens": 890, "cache_read_input_tokens": 123 } }}Summary Entries
Section titled “Summary Entries”{ "type": "summary", "summary": "Brief description of the conversation", "leafUuid": "uuid-of-last-message"}summary and compact_summary entries are skipped by the adapter (_role_from() returns None for them) — they are not inserted into the messages table.
Tool Usage
Section titled “Tool Usage”Common Tools
Section titled “Common Tools”- File Operations:
Read,Write,Edit,MultiEdit - System:
Bash,Grep,Glob,LS - Task Management:
TodoWrite,TodoRead - Special:
Task(launches sub-agents),WebFetch,WebSearch - Jupyter:
NotebookRead,NotebookEdit
Tool Results
Section titled “Tool Results”Tool results appear in subsequent user messages:
{ "type": "tool_result", "tool_use_id": "toolu_xxxxx", "content": "Result of tool execution", "is_error": true}Tool Names on Records
Section titled “Tool Names on Records”ClaudeAdapter._tools_from() walks the message.content array and collects every block whose type is "tool_use". The resulting tuple of names is stored in the Record.tools field and serialised as tools_json in the messages table.
Task Tool Limitations
Section titled “Task Tool Limitations”Critical: Task tool operations are NOT individually logged:
- Only the Task invocation and final result appear in logs
- Internal tool operations by sub-agents are invisible
- Token usage by sub-agents is NOT tracked
- This causes apparent “missing” tool counts in analytics
Special Cases
Section titled “Special Cases”Streaming Responses
Section titled “Streaming Responses”Claude logs streaming responses as multiple entries with the same message ID:
// Entry 1: Text response{ "type": "assistant", "message": { "id": "msg_01Y9yWFraRY5ptb3Bqbvpmqx", "content": [{"type": "text", "text": "I'll implement..."}] }}
// Entry 2: Tool use (same message ID){ "type": "assistant", "message": { "id": "msg_01Y9yWFraRY5ptb3Bqbvpmqx", "content": [{"type": "tool_use", "name": "Write", ...}] }}Conversation Compaction
Section titled “Conversation Compaction”When conversations approach context limits, Claude Code creates comprehensive summaries:
{ "type": "user", "isCompactSummary": true, "message": { "role": "user", "content": [{ "type": "text", "text": "This session is being continued from a previous conversation..." }] }}Error Types
Section titled “Error Types”User Rejection (Before Execution)
Section titled “User Rejection (Before Execution)”{ "type": "tool_result", "content": "The user doesn't want to proceed with this tool use...", "is_error": true}User Interruption (During Execution)
Section titled “User Interruption (During Execution)”Appears as both error AND user message:
// As error{ "type": "tool_result", "content": "[Request interrupted by user for tool use]", "is_error": true}// As user message{ "type": "user", "message": { "content": [{"text": "[Request interrupted by user for tool use]no, don't..."}] }}Processing Pipeline
Section titled “Processing Pipeline”Overview
Section titled “Overview”~/.claude/ | vClaudeAdapter (stackunderflow/adapters/claude.py) enumerate() -> SessionRef[] read(ref) -> Record[] | vingest/writer (stackunderflow/ingest/writer.py) ingest_file() -- one transaction per file, mtime + byte-offset tracking via ingest_log table | vSQLite store (~/.stackunderflow/store.db) projects / sessions / messages / ingest_log tables | vstore/queries (stackunderflow/store/queries.py) get_project_stats() -- reconstructs RawEntry objects from raw_json, feeds the stats chain below | vstats chain (stackunderflow/stats/) classifier -> enricher -> aggregator -> formatter | vAPI routes (stackunderflow/routes/)Incremental Ingest
Section titled “Incremental Ingest”ingest/writer.run_ingest() compares each SessionRef’s (mtime, size) against the ingest_log table:
- Unchanged (same mtime and size): skip entirely — no read, no transaction.
- Appended (larger size, same or newer mtime): seek to
processed_offsetand read only the new bytes. - Truncated / rotated (size shrank): delete the
ingest_logrow and reparse from byte 0.
This means large projects pay for a filesystem stat check only, not a full reparse, on every poll.
Record Normalisation
Section titled “Record Normalisation”ClaudeAdapter._parse_line() converts a raw JSONL object into a Record dataclass:
# Role assignmentbase_type = obj['type'] # 'user' | 'assistant' | 'summary' | ...if base_type == 'user': role = 'user'elif base_type == 'assistant': role = 'assistant'elif base_type in ('summary', 'compact_summary'): return None # skip — not a conversational recordToken counts come from message.usage; tool names from every "tool_use" block in message.content; the entire raw dict is preserved in Record.raw and written to messages.raw_json.
Timezone Handling
Section titled “Timezone Handling”All timestamps are stored in UTC but displayed in the user’s local timezone:
- Frontend detects timezone offset:
new Date().getTimezoneOffset() - Backend converts UTC to local time for grouping
- Charts display dates in the user’s local timezone
Deduplication and Tool Counting
Section titled “Deduplication and Tool Counting”The Problem
Section titled “The Problem”When Claude Code crashes and restarts with --continue:
- Duplicate messages appear in multiple files
- Same message shows inconsistent tool counts
- Incomplete assistant responses
- Missing tool execution logs
Solution: stats/classifier Deduplication
Section titled “Solution: stats/classifier Deduplication”The stackunderflow/stats/classifier.py module receives a list of RawEntry objects (reconstructed from messages.raw_json) and performs two-phase deduplication:
-
Phase 1 — ID-based merge: Merges entries sharing the same
message.id(keeping the longer content variant). This handles streaming responses where Claude emits multiple entries for the same message. -
Phase 2 — Exact duplicate drop: Drops exact duplicates by hashing timestamp + content + UUID. This handles entries duplicated across files after crash/continue scenarios.
The deduplication logic that was previously in pipeline/dedup.py now lives inside the stats chain at stackunderflow/stats/classifier.py. The on-disk records themselves are stored with duplicates intact — dedup is a query-time operation so the raw JSONL is always faithfully preserved.
Edge Cases Handled
Section titled “Edge Cases Handled”- Split Interactions: User message in file A, assistant response in file B
- Incomplete Tool Executions: Crash during tool execution
- Compact Summary Continuations: Sessions starting with summaries
- Missing Tool Logs: Tools used but not logged
- Streaming Response Merging: Multiple entries with same message ID
- Task Tool Sidechains: Sub-agent operations not logged
Storage
Section titled “Storage”Database Location
Section titled “Database Location”~/.stackunderflow/store.dbSchema
Section titled “Schema”The authoritative schema lives in stackunderflow/store/migrations/v001_initial.sql. Key tables:
| Table | Purpose |
|---|---|
projects | One row per (provider, slug) pair |
sessions | One row per session UUID, FK to projects |
messages | One row per parsed line, FK to sessions |
ingest_log | One row per source file; tracks mtime, size, processed_offset |
messages is the central table. Rows are keyed on (session_fk, seq) where seq is the byte offset of the line within its source file. Every row carries a raw_json column containing the full original JSONL object, so nothing is ever discarded during ingest — downstream consumers reconstruct whatever they need from the raw payload.
Selected messages columns:
seq(INTEGER) — byte offset used as a stable, monotonically increasing sequence numberrole(TEXT) —"user"or"assistant"model(TEXT) — model identifier when present in the source lineinput_tokens,output_tokens,cache_create_tokens,cache_read_tokens(INTEGER)tools_json(TEXT) — JSON array of tool names called in this messageraw_json(TEXT) — the complete original JSONL objectis_sidechain(INTEGER 0/1) — set whenisSidechainis true in the sourceuuid,parent_uuid(TEXT) — message threading fields from the JSONL
All typed query helpers that read from the store live in stackunderflow/store/queries.py. Application code imports helpers from there rather than writing raw SQL.
Legacy Format
Section titled “Legacy Format”Before January 2026, Claude Code did not write per-project JSONL files. Instead, all prompts were appended to a single centralised file:
~/.claude/history.jsonlEach line in that file has a different shape from modern per-project JSONL — notably it uses "project" (an absolute path string) and "timestamp" (milliseconds since epoch) rather than the nested "message" object modern sessions use.
ClaudeAdapter handles both formats transparently:
enumerate()checks each project directory for.jsonlfiles. If none are found but a.continuation_cache.jsonexists, it treats the project as legacy and yields a single syntheticSessionRefwhosesession_idstarts with"legacy-"and whosefile_pathpoints at~/.claude/history.jsonl.read()detects the"legacy-"prefix and calls_read_history()instead of_read_jsonl()._read_history()filters lines by_slug_for(obj["project"]), converts the millisecond timestamp to ISO 8601, and yields minimalRecordobjects (role"user", no token counts, no tools) — one per matching history line.
This means analytics for pre-January-2026 projects will show user prompts but no token counts or model information, since the legacy format does not record those fields.
Known Issues and Solutions
Section titled “Known Issues and Solutions”Issue 1: Duplicate Commands in Table
Section titled “Issue 1: Duplicate Commands in Table”Cause: Same user message in multiple files after crash/continue
Solution: Two-phase deduplication in stats/classifier.py at query time; raw records are preserved intact in the store
Issue 2: Wrong Tool Counts
Section titled “Issue 2: Wrong Tool Counts”Cause: Incomplete logging, Task tool limitations, streaming issues Solution: Tool count reconciliation across all interaction versions during the classify → enrich chain
Issue 3: Missing Model Names
Section titled “Issue 3: Missing Model Names”Cause: Incomplete assistant messages from crashes
Solution: Preserve model info during interaction merging in the stats chain; MAX(CASE WHEN model IS NOT NULL …) aggregation in get_session_stats()
Issue 4: Overview Refresh Intermittent
Section titled “Issue 4: Overview Refresh Intermittent”Status: Documented in TODO Workaround: Refresh individual project dashboards first
Success Metrics
Section titled “Success Metrics”- Accuracy: No duplicate messages, correct type classification
- Performance: Incremental ingest — only new bytes read per poll cycle
- Completeness: All tools counted accurately; raw JSONL always preserved
- Timezone Support: Correct local time display
- Reliability: Graceful handling of crashes and continuations