> ## Documentation Index
> Fetch the complete documentation index at: https://critiqor.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Critiqor Data Types: Public API Type Reference Guide

> Reference for all Critiqor public data types: ToolCall, ToolOutput, FailureCause, EvaluationRecord, PolicyCheckResult, TrendAnalysis, and more.

All Critiqor data types are frozen dataclasses — immutable once constructed. Every type exposes a `to_dict()` method that returns a JSON-serializable `dict`. Import any type directly from the top-level `critiqor` package:

```python theme={null}
from critiqor import ToolCall, ToolOutput, FailureCause, EvaluationRecord, PolicyCheckResult, TrendAnalysis
```

***

## Type Aliases

```python theme={null}
TrustLevel             = Literal["High", "Moderate", "Low"]
EvidenceLevel          = Literal["response_only", "trace_available", "fully_instrumented"]
FailureSeverity        = Literal["low", "medium", "high"]
DeploymentRecommendation = Literal["safe_to_deploy", "review_recommended", "unsafe_for_production"]
AgentType              = Literal["coding", "research", "customer_support", "general"]
CertificationLevel     = Literal["none", "bronze", "silver", "gold", "platinum"]
TrendDirection         = Literal["improving", "stable", "declining", "insufficient_data"]
```

***

## `ToolCall`

Represents a single observed tool invocation. Produced by `EvidenceRecorder.record_tool_call()` and collected in `EvaluationEvidence.tool_calls`.

| Field       | Type            | Description                                             |
| ----------- | --------------- | ------------------------------------------------------- |
| `tool`      | `str`           | Tool name.                                              |
| `args`      | `dict`          | Arguments passed to the tool.                           |
| `id`        | `str \| None`   | Optional call ID used to correlate with a `ToolOutput`. |
| `timestamp` | `float \| None` | Unix timestamp of the call.                             |

***

## `ToolOutput`

Represents the result of a single tool invocation. Produced by `EvidenceRecorder.record_tool_output()` and collected in `EvaluationEvidence.tool_outputs`.

| Field       | Type            | Description                                               |
| ----------- | --------------- | --------------------------------------------------------- |
| `tool`      | `str`           | Tool name.                                                |
| `output`    | `Any`           | The tool's result.                                        |
| `call_id`   | `str \| None`   | Correlates with the `id` of a prior `ToolCall`.           |
| `error`     | `str \| None`   | Error message if the tool call failed; `None` on success. |
| `timestamp` | `float \| None` | Unix timestamp of the output.                             |

***

## `RuntimeMetrics`

Aggregated runtime statistics for a single agent execution. Populated automatically by `EvidenceRecorder.finish()` or supplied directly to `Critiqor.evaluate()`.

| Field         | Type            | Description                                                                                         |
| ------------- | --------------- | --------------------------------------------------------------------------------------------------- |
| `latency`     | `float \| None` | Wall-clock duration of the execution in seconds.                                                    |
| `token_usage` | `dict`          | Token usage breakdown, e.g. `{"prompt_tokens": 120, "completion_tokens": 80, "total_tokens": 200}`. |
| `retries`     | `int`           | Number of retry events observed during the run.                                                     |
| `errors`      | `list[str]`     | List of error message strings captured during the run.                                              |

***

## `FailureCause`

A structured explanation for a trust-score penalty. Failure causes are detected deterministically by `detect_failure_causes()` and returned in `CritiqorResult.failure_causes`.

| Field            | Type                | Description                                                                                                                                                                     |
| ---------------- | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `type`           | `str`               | Failure type identifier, e.g. `"infinite_tool_loop"`, `"ignored_tool_output"`, `"unsupported_claims"`, `"redundant_tool_calls"`, `"runtime_failures"`, `"confidence_mismatch"`. |
| `severity`       | `FailureSeverity`   | `"low"`, `"medium"`, or `"high"`.                                                                                                                                               |
| `impact`         | `int`               | Trust-score penalty applied by this cause (negative integer).                                                                                                                   |
| `description`    | `str`               | Human-readable description of what was observed.                                                                                                                                |
| `root_cause`     | `RootCause \| None` | Optional deeper root cause enrichment.                                                                                                                                          |
| `recommendation` | `str`               | Suggested remediation. Empty string if none is available.                                                                                                                       |

***

## `RootCause`

Optional enrichment nested inside a `FailureCause`. Provides a deeper causal explanation and a concrete fix recommendation.

| Field             | Type  | Description                                           |
| ----------------- | ----- | ----------------------------------------------------- |
| `description`     | `str` | Explanation of the underlying cause.                  |
| `impact`          | `str` | Human-readable description of the downstream impact.  |
| `trust_penalty`   | `int` | Trust-score deduction contributed by this root cause. |
| `recommended_fix` | `str` | Concrete remediation suggestion.                      |

***

## `EvaluationRecord`

A persisted representation of one Critiqor evaluation. Returned by `save_evaluation()` and loaded back by `load_evaluations()`. Also produced by `CritiqorResult.to_record()`.

| Field                       | Type                       | Description                                               |
| --------------------------- | -------------------------- | --------------------------------------------------------- |
| `run_id`                    | `str`                      | Unique run identifier (UUID).                             |
| `agent_id`                  | `str`                      | Agent identifier.                                         |
| `timestamp`                 | `str`                      | ISO 8601 UTC timestamp of the evaluation.                 |
| `scores`                    | `dict`                     | Per-dimension reliability scores keyed by dimension name. |
| `failure_causes`            | `list[FailureCause]`       | All failure causes detected for this run.                 |
| `trust_score`               | `int`                      | Overall trust score (0–100).                              |
| `evidence_level`            | `EvidenceLevel`            | Evidence quality used for this evaluation.                |
| `evaluation_confidence`     | `int`                      | Critiqor's self-confidence in the evaluation (0–100).     |
| `deployment_recommendation` | `DeploymentRecommendation` | The deployment gate result for this run.                  |

### `EvaluationRecord.to_dict()`

Returns a JSON-serializable dict. Failure causes are serialized via their own `to_dict()` methods.

### `EvaluationRecord.from_dict()`

```python theme={null}
EvaluationRecord.from_dict(payload: dict) → EvaluationRecord
```

Reconstructs an `EvaluationRecord` from a previously serialized dict. Unknown or invalid field values are coerced to safe defaults.

***

## `PolicyCheckResult`

Returned by `check_policy()`. Represents a CI/CD deployment gate decision for a given agent run.

| Field                       | Type                       | Description                                                                                        |
| --------------------------- | -------------------------- | -------------------------------------------------------------------------------------------------- |
| `passed`                    | `bool`                     | `True` if the run met all configured policy thresholds.                                            |
| `deployment_recommendation` | `DeploymentRecommendation` | The deployment decision: `"safe_to_deploy"`, `"review_recommended"`, or `"unsafe_for_production"`. |
| `messages`                  | `list[str]`                | Human-readable messages explaining the gate result — which thresholds passed or failed.            |

***

## `TrendAnalysis`

Returned by `analyze_trends()`. Summarizes the direction and magnitude of reliability change across multiple historical runs for a single agent.

| Field                     | Type             | Description                                                                                  |
| ------------------------- | ---------------- | -------------------------------------------------------------------------------------------- |
| `trust_trend`             | `TrendDirection` | Overall trend direction: `"improving"`, `"stable"`, `"declining"`, or `"insufficient_data"`. |
| `trust_change`            | `int`            | Average change in trust score per run (positive = improving).                                |
| `hallucination_change`    | `int`            | Average change in the hallucination score per run.                                           |
| `tool_reliability_change` | `int`            | Average change in the tool reliability score per run.                                        |
| `reasoning_change`        | `int`            | Average change in the reasoning score per run.                                               |
| `summary`                 | `str`            | Human-readable narrative of the trend.                                                       |

***

## `ReliabilityCertification`

Returned by `certify_run()`. Encodes a standardized certification level for a run or benchmark suite result.

| Field                 | Type                 | Description                                                                      |
| --------------------- | -------------------- | -------------------------------------------------------------------------------- |
| `certification_level` | `CertificationLevel` | `"none"`, `"bronze"`, `"silver"`, `"gold"`, or `"platinum"`.                     |
| `trust_score`         | `int`                | Trust score used to determine the certification level.                           |
| `percentile`          | `int`                | Percentile rank among historical runs.                                           |
| `markdown_badge`      | `str`                | Ready-to-embed Markdown badge string for README files.                           |
| `criteria`            | `dict`               | The threshold criteria that were evaluated to arrive at the certification level. |

***

## `AgentProfile`

Registered identity for an agent, used for cross-agent ranking and leaderboard participation.

| Field      | Type   | Description                                                                     |
| ---------- | ------ | ------------------------------------------------------------------------------- |
| `agent_id` | `str`  | Unique agent identifier.                                                        |
| `name`     | `str`  | Display name. Defaults to `agent_id` if not set.                                |
| `category` | `str`  | Agent category: `"coding"`, `"research"`, `"customer_support"`, or `"general"`. |
| `metadata` | `dict` | Arbitrary additional metadata.                                                  |

***

## `BenchmarkResult`

Returned by `benchmark_run()`. Aggregates scores across all prompts in a benchmark suite.

| Field         | Type                   | Description                                        |
| ------------- | ---------------------- | -------------------------------------------------- |
| `name`        | `str`                  | Benchmark suite name.                              |
| `agent_type`  | `AgentType`            | Agent category used for percentile ranking.        |
| `trust_score` | `int`                  | Average trust score across all benchmark runs.     |
| `percentile`  | `int`                  | Percentile rank among agents in the same category. |
| `run_count`   | `int`                  | Number of prompts evaluated.                       |
| `scores`      | `dict[str, int]`       | Average per-dimension scores across all runs.      |
| `results`     | `list[CritiqorResult]` | Individual results for each benchmark prompt.      |

***

## `CausalGraph`

A structured causal graph for a single failure event. Returned by `build_causal_graph()`.

| Field           | Type                    | Description                                             |
| --------------- | ----------------------- | ------------------------------------------------------- |
| `failure_event` | `str`                   | The root failure type (e.g. `"infinite_tool_loop"`).    |
| `causal_graph`  | `list[CausalGraphEdge]` | Ordered list of directed causal edges.                  |
| `run_id`        | `str \| None`           | Run identifier this graph was built from, if available. |

### `CausalGraph.explain()`

Returns the causal chain as a human-readable string, e.g.:
`"Prompt was ambiguous -> Agent selected incorrect tool -> Evidence was missing -> Final answer hallucinated"`

***

## `ReliabilityInsight`

An executive summary generated by `generate_insights()` from historical reliability data.

| Field             | Type        | Description                                                 |
| ----------------- | ----------- | ----------------------------------------------------------- |
| `summary`         | `str`       | High-level narrative of agent reliability trends.           |
| `primary_drivers` | `list[str]` | The top contributing factors to recent reliability changes. |
