> ## Documentation Index
> Fetch the complete documentation index at: https://critiqor.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# EvidenceRecorder: Capture Tool Calls for Richer Scoring

> EvidenceRecorder is a context manager for manually capturing tool calls, tool outputs, and LLM calls to provide fully_instrumented evidence to Critiqor.

`EvidenceRecorder` is the SDK-level instrumentation primitive for agents that do not run on OpenClaw or a supported framework adapter. Wrapping an execution block inside a `monitor()` context upgrades the evidence quality from `response_only` to `fully_instrumented`, which raises `evaluation_confidence` and unlocks more accurate failure cause detection in the resulting `CritiqorResult`.

Use `EvidenceRecorder` when you are calling a custom agent, a bare LLM client, or any tool-using pipeline where automatic framework detection does not apply.

## Import

```python theme={null}
from critiqor import monitor, EvidenceRecorder
```

***

## `monitor()`

```python theme={null}
monitor(prompt="") → EvidenceRecorder
```

Module-level factory function that creates an `EvidenceRecorder` context manager scoped to a single agent execution. The `prompt` parameter is optional at construction time and can be supplied later when calling `finish()`.

| Parameter | Type  | Description                                                                |
| --------- | ----- | -------------------------------------------------------------------------- |
| `prompt`  | `str` | The user prompt for this execution. Optional here; required by `finish()`. |

**Returns:** `EvidenceRecorder` — a context manager that begins capturing on `__enter__` and closes on `__exit__`.

***

## Usage

```python theme={null}
from critiqor import Critiqor, monitor

agent = Critiqor(your_agent)

with monitor("What is 2 + 2?") as recorder:
    recorder.record_tool_call("calculator", {"expression": "2 + 2"})
    result = your_agent.run("What is 2 + 2?")
    recorder.record_tool_output("calculator", "4")
    evidence = recorder.finish(result, "What is 2 + 2?")

critiqor_result = agent.evaluate(
    prompt="What is 2 + 2?",
    response=result,
    tool_calls=evidence.tool_calls,
    tool_outputs=evidence.tool_outputs,
    evidence_level="trace_available",
)
```

The `with` block automatically records `agent_start` and `agent_finish` trace events, captures any unhandled exceptions as `error` events, and resets the recorder context variable when the block exits. You can also call methods on the recorder directly inside any synchronous code without the `with` block — just call `finish()` manually when done.

***

## Methods

### `record_tool_call()`

```python theme={null}
recorder.record_tool_call(tool, args=None, call_id=None)
```

Records a tool invocation. Appends a `ToolCall` to the recorder's internal list and emits a `tool_start` trace event.

| Parameter | Type           | Description                                                              |
| --------- | -------------- | ------------------------------------------------------------------------ |
| `tool`    | `str`          | Name of the tool being called.                                           |
| `args`    | `dict \| None` | Arguments passed to the tool. Defaults to an empty dict if not provided. |
| `call_id` | `str \| None`  | Optional identifier used to correlate this call with its output.         |

***

### `record_tool_output()`

```python theme={null}
recorder.record_tool_output(tool, output, call_id=None, error=None)
```

Records a tool result. Appends a `ToolOutput` and emits a `tool_end` trace event. If `error` is provided, it is also appended to the recorder's error list.

| Parameter | Type          | Description                                        |
| --------- | ------------- | -------------------------------------------------- |
| `tool`    | `str`         | Name of the tool that produced the output.         |
| `output`  | `Any`         | The tool's return value.                           |
| `call_id` | `str \| None` | Correlates with a prior `record_tool_call()` call. |
| `error`   | `str \| None` | Error message string if the tool call failed.      |

***

### `record_llm_call()`

```python theme={null}
recorder.record_llm_call(model=None, token_usage=None)
```

Records an LLM invocation for token counting and cost analysis. Token usage data is merged into the recorder's `token_usage` dict and propagated to `RuntimeMetrics` when `finish()` is called.

| Parameter     | Type           | Description                                                                                    |
| ------------- | -------------- | ---------------------------------------------------------------------------------------------- |
| `model`       | `str \| None`  | Model identifier (e.g. `"gpt-4o"`, `"llama3.2"`).                                              |
| `token_usage` | `dict \| None` | Token usage dict, e.g. `{"prompt_tokens": 120, "completion_tokens": 80, "total_tokens": 200}`. |

***

### `record_event()`

```python theme={null}
recorder.record_event(name, **payload)
```

Records a generic named event with an arbitrary keyword-argument payload. The event is timestamped automatically and appended to the trace. Use this for framework-specific events that don't fit the tool call or LLM call shapes.

| Parameter   | Type  | Description                                                   |
| ----------- | ----- | ------------------------------------------------------------- |
| `name`      | `str` | Event name (e.g. `"state_transition"`, `"decision_made"`).    |
| `**payload` | `Any` | Arbitrary keyword arguments included in the trace event dict. |

***

### `wrap_tool()`

```python theme={null}
recorder.wrap_tool(name, func) → callable
```

Returns an instrumented wrapper around a callable tool that automatically records `record_tool_call()` and `record_tool_output()` for every invocation. If the underlying function raises an exception, the error is recorded and the exception is re-raised.

| Parameter | Type       | Description                              |
| --------- | ---------- | ---------------------------------------- |
| `name`    | `str`      | The tool name used in recorded evidence. |
| `func`    | `callable` | The tool function to wrap.               |

**Returns:** A new callable with the same signature as `func`.

```python theme={null}
calculator = recorder.wrap_tool("calculator", raw_calculator_fn)
result = calculator("2 + 2")  # automatically recorded
```

***

### `finish()`

```python theme={null}
recorder.finish(response="", prompt=None) → EvaluationEvidence
```

Closes the recorder and assembles the collected tool calls, outputs, trace events, and runtime metrics into an `EvaluationEvidence` object. Wall-clock latency is measured from the time the context manager was entered. The returned evidence always has `evidence_level="fully_instrumented"`.

| Parameter  | Type          | Description                                    |
| ---------- | ------------- | ---------------------------------------------- |
| `response` | `str`         | The agent's final response string.             |
| `prompt`   | `str \| None` | Overrides the prompt set at construction time. |

**Returns:** [`EvaluationEvidence`](#evaluationevidence-fields)

***

## `EvaluationEvidence` Fields

`EvaluationEvidence` is a frozen dataclass returned by `finish()` and also accessible as `CritiqorResult.evidence`. It holds the complete normalized evidence snapshot used during evaluation.

| Field            | Type               | Description                                                                                                  |
| ---------------- | ------------------ | ------------------------------------------------------------------------------------------------------------ |
| `prompt`         | `str`              | The input prompt for the evaluated run.                                                                      |
| `response`       | `str`              | The agent's response.                                                                                        |
| `tool_calls`     | `list[ToolCall]`   | All captured tool calls in order.                                                                            |
| `tool_outputs`   | `list[ToolOutput]` | All captured tool outputs in order.                                                                          |
| `trace`          | `list[dict]`       | Full event trace, including `agent_start`, `tool_start`, `tool_end`, LLM calls, and `agent_finish` events.   |
| `metrics`        | `RuntimeMetrics`   | Wall-clock latency, token usage, retry count, and error strings.                                             |
| `evidence_level` | `EvidenceLevel`    | `"response_only"`, `"trace_available"`, or `"fully_instrumented"`. Inferred automatically if not overridden. |
