Skip to content

Architecture

Design Principles

  • Deterministic code first, LLM second — prefer algorithmic solutions over language model calls.
  • Bounded autonomy — agents operate within predefined action sets, not free-form reasoning.
  • Typed contracts — all data flows use validated schemas (pydantic).
  • Full provenance — every action is recorded with inputs, outputs, versions, and costs.
  • Human-in-the-loop governance — expensive compute and wet-lab actions require approval.

Phase 0 — Current Architecture

Phase 0 establishes the foundational infrastructure:

Carmel.py                    # CLI entrypoint (repo root)
carmel/
├── __init__.py          # Package root, version export
├── version.py           # Single source of truth for version string
├── config.py            # Pydantic config models, YAML loading and validation
├── paths.py             # Path utilities, workspace directory initialization
└── logger.py            # Centralized logger configuration

Config System

Configuration uses pydantic BaseModel subclasses with extra="forbid" for strict validation. The CarmelConfig model validates workspace settings loaded from YAML files. Validators normalize logging level casing and expand tilde in paths.

Workspace Layout

init_workspace() creates a standard directory structure for campaign data:

Directory Purpose
benchmarks/ Curated benchmark bundles and credence records
evidence/ Literature memos, extracted records, source links
models/ Generated mechanism versions and diffs
provenance/ Hashes, versions, tool settings, costs
reports/ Final and intermediate reports
runs/ Executed tool runs and statuses

Logging

Centralized through a carmel namespace logger with configurable level and optional file output. Child loggers inherit configuration via get_logger().

CLI

Carmel.py at the repo root, following ARC/T3 convention. Argparse-based with three subcommands: version, validate-config, init-workspace. The main() function accepts an optional argv list for testability.

Phase 1+ — Planned Architecture

Agent Ensemble

Agent Role
Planner Campaign state, budget management, action ranking
Literature Agent RAG over curated corpus, typed evidence memos
Data Agent Benchmark normalization, credence scoring
Revision Router Discrepancy-to-action mapping (bounded action set)
X-Design Agent Deterministic experiment design generation
Execution Controller Job submission, provenance, approval enforcement
Reporting Agent Plan summaries, approval memos, reports

External Tool Integration

Tool Trust Level
T3 Trusted
RMG Trusted with caution
ARC Trusted
Cantera Trusted
TCKDB Trusted

Campaign Artifacts

Each campaign workspace will additionally contain:

  • campaign.yaml — canonical structured input
  • campaign.md — human-readable summary
  • preferences.md — approval thresholds and policies
  • plan.md — current proposed actions
  • approvals.md — human approvals and notes
  • decision_log.jsonl — append-only decision stream

Hard Constraints

  • One top-level planner only
  • No dynamic agent spawning
  • No free-form tool invocation
  • Typed schemas for every tool call
  • Literature never writes directly into the model
  • All expensive actions gated by budget checks
  • All high-stakes actions gated by HITL policy
  • Append-only decision log
  • Full provenance for every action