Architecture
Design Principles
- Deterministic code first, LLM second — prefer algorithmic solutions over language model calls.
- Bounded autonomy — agents operate within predefined action sets, not free-form reasoning.
- Typed contracts — all data flows use validated schemas (pydantic).
- Full provenance — every action is recorded with inputs, outputs, versions, and costs.
- Human-in-the-loop governance — expensive compute and wet-lab actions require approval.
Phase 0 — Current Architecture
Phase 0 establishes the foundational infrastructure:
Carmel.py # CLI entrypoint (repo root)
carmel/
├── __init__.py # Package root, version export
├── version.py # Single source of truth for version string
├── config.py # Pydantic config models, YAML loading and validation
├── paths.py # Path utilities, workspace directory initialization
└── logger.py # Centralized logger configuration
Config System
Configuration uses pydantic BaseModel subclasses with extra="forbid" for strict validation. The CarmelConfig model validates workspace settings loaded from YAML files. Validators normalize logging level casing and expand tilde in paths.
Workspace Layout
init_workspace() creates a standard directory structure for campaign data:
| Directory | Purpose |
|---|---|
benchmarks/ |
Curated benchmark bundles and credence records |
evidence/ |
Literature memos, extracted records, source links |
models/ |
Generated mechanism versions and diffs |
provenance/ |
Hashes, versions, tool settings, costs |
reports/ |
Final and intermediate reports |
runs/ |
Executed tool runs and statuses |
Logging
Centralized through a carmel namespace logger with configurable level and optional file output. Child loggers inherit configuration via get_logger().
CLI
Carmel.py at the repo root, following ARC/T3 convention. Argparse-based with three subcommands: version, validate-config, init-workspace. The main() function accepts an optional argv list for testability.
Phase 1+ — Planned Architecture
Agent Ensemble
| Agent | Role |
|---|---|
| Planner | Campaign state, budget management, action ranking |
| Literature Agent | RAG over curated corpus, typed evidence memos |
| Data Agent | Benchmark normalization, credence scoring |
| Revision Router | Discrepancy-to-action mapping (bounded action set) |
| X-Design Agent | Deterministic experiment design generation |
| Execution Controller | Job submission, provenance, approval enforcement |
| Reporting Agent | Plan summaries, approval memos, reports |
External Tool Integration
| Tool | Trust Level |
|---|---|
| T3 | Trusted |
| RMG | Trusted with caution |
| ARC | Trusted |
| Cantera | Trusted |
| TCKDB | Trusted |
Campaign Artifacts
Each campaign workspace will additionally contain:
campaign.yaml— canonical structured inputcampaign.md— human-readable summarypreferences.md— approval thresholds and policiesplan.md— current proposed actionsapprovals.md— human approvals and notesdecision_log.jsonl— append-only decision stream
Hard Constraints
- One top-level planner only
- No dynamic agent spawning
- No free-form tool invocation
- Typed schemas for every tool call
- Literature never writes directly into the model
- All expensive actions gated by budget checks
- All high-stakes actions gated by HITL policy
- Append-only decision log
- Full provenance for every action