AI Agents 17.02.2026 1 views

GrepCodebaseTool and DatabaseSchemaTool: How Agents Understand the Codebase

How agents build codebase context — GrepCodebaseTool runs regex search with file type filters and truncates results at 6K chars, DatabaseSchemaTool introspects via SHOW TABLES + DESCRIBE for read-only schema knowledge, and why agents grep for callers before patching existing code.

Before an agent writes code, it needs to understand the existing codebase. GrepCodebaseTool finds patterns across files. DatabaseSchemaTool reveals table structures. Together, they give agents the context that prevents blind code generation.

GrepCodebaseTool

Regex-powered search across the codebase:

public function getInputSchema(): array
{
    return [
        'type' => 'object',
        'properties' => [
            'pattern' => ['type' => 'string', 'description' => 'Regex pattern to search for'],
            'file_type' => ['type' => 'string', 'description' => 'File extension filter: php, js, css'],
            'path' => ['type' => 'string', 'description' => 'Subdirectory to search in'],
            'max_results' => ['type' => 'integer', 'description' => 'Max matching lines (default 50)'],
        ],
        'required' => ['pattern'],
    ];
}

The tool uses PHP's RecursiveDirectoryIterator + preg_match(). Not as fast as ripgrep, but zero external dependencies.

Why Agents Grep

The most common agent workflow before patching:

"Fix the N+1 in InvoicesController" → agent reads the controller
Agent identifies the query that needs eager loading
Agent greps for all callers of the affected method
Agent verifies no other code depends on the current behavior
Agent generates the patch

Step 3 is critical. Without it, the agent fixes the N+1 but breaks a caller that expected lazy loading. The grep_codebase tool makes this possible.

Truncation

Results are capped at ~6K characters (configured in AgentExecutor's smart truncation). Grep results are repetitive by nature — 50 matches of the same pattern show the same code structure. The first matches (6K worth) give enough context; the rest adds noise.

DatabaseSchemaTool

Read-only database introspection:

public function execute(array $params): array
{
    $action = $params['action'] ?? 'list_tables';

    return match ($action) {
        'list_tables' => $this->listTables(),
        'describe' => $this->describeTable($params['table']),
        'indexes' => $this->showIndexes($params['table']),
        default => ['success' => false, 'error' => 'Unknown action'],
    };
}

Three actions:

list_tables → SHOW TABLES (what tables exist)

describe → DESCRIBE table_name (columns, types, nullability, defaults)

indexes → SHOW INDEX FROM table_name (index structure)

Read-only. No CREATE, ALTER, DROP, INSERT, UPDATE, DELETE. The tool can't modify the database — it can only inspect it. Agents that need to write SQL (like KnjigAgent) use dedicated domain tools that enforce business rules.

Why Schema Context Matters

When VajbCoder writes a new controller that queries pm_subtasks, it needs to know:

What columns exist (to write correct SELECT)

Which are nullable (to handle null in PHP)

What types they are (to add correct type hints)

What indexes exist (to write queries that use them)

Without schema context, the agent guesses column names — and guesses wrong. DatabaseSchemaTool turns "I think there's a status column" into "There's a status VARCHAR(20) NOT NULL DEFAULT 'todo', indexed."

The truncation limit for schema is 8K chars (higher than grep's 6K) because schema completeness matters more than search result completeness.

Up Next

Next up: GeneratePatchTool: The Crown Jewel — From AI Suggestion to Atomic File Write — the tool that turns AI suggestions into actual code changes, with PatchValidator's 6-step security check and PatchApplier's atomic write + git branch isolation.

GrepCodebaseTool and DatabaseSchemaTool: How Agents Understand the Codebase

GrepCodebaseTool

Why Agents Grep

Truncation

DatabaseSchemaTool

Why Schema Context Matters

Up Next

Comments (0)

Leave a Comment

Recent Posts

The Full Picture: Why This Is Just the Beginning

Security, Token Economics, and Observability in an Agentic System

Why No LangChain: Building a Multi-Agent Runtime in PHP

The Git Healer Flow: LUKA Detects, LukAgent Writes, Git Isolates, Human Approves

"Scheduler → Agent → Approval: Autonomous Runs With Human Checkpoints"