Back to Catalog
Agentic AI
Tool Use & Environment
Code Execution
Let the model write code, run it in a sandbox, and use the output as the answer — no more trusting the LLM to compute in its head.
Intent & Description
🎯 Intent
Offload deterministic computation to an actual interpreter instead of hoping the model gets the math right.
📋 Context
Your agent does arithmetic, data wrangling, parsing, or other deterministic work. LLMs hallucinate on this stuff. You have a sandboxed Python or JS interpreter available.
💡 Solution
The agent emits a code block; a controlled sandbox (Python, JS VM, or container) runs it; stdout/stderr/return value flow back. Repeat within a step budget. The CodeAct approach treats code as the primary action language.
Real-world Use Case
- The task involves calculations, parsing, or transformations that LLMs reliably hallucinate on.
- A controlled sandbox is available and trusted to run model-emitted code.
- Stdout, stderr, and return values can feed back into the agent loop.
Source
📌 TL;DR
Don’t ask the LLM to do math in its head. Let it write the code, run it in a sandbox, and trust the output.
Advantages
- Deterministic compute on top of probabilistic intent — the right division of labor.
- Code is auditable and replayable; the same script can be rerun for debugging.
Disadvantages
- Sandbox security is its own serious engineering problem — weak sandboxes mean arbitrary code execution.
- Very flexible action space increases failure modes compared to a curated tool palette.