GrepCodebaseTool and DatabaseSchemaTool: How Agents Understand the Codebase

How agents build codebase context — GrepCodebaseTool runs regex search with file type filters and truncates results at 6K chars, DatabaseSchemaTool introspects via SHOW TABLES + DESCRIBE for read-only schema knowledge, and why agents grep for callers before patching existing code.

Before an agent writes code, it needs to understand the existing codebase. GrepCodebaseTool finds patterns across files. DatabaseSchemaTool reveals table structures. Together, they give agents the context that prevents blind code generation.


GrepCodebaseTool

Regex-powered search across the codebase:

public function getInputSchema(): array
{
    return [
        'type' => 'object',
        'properties' => [
            'pattern' => ['type' => 'string', 'description' => 'Regex pattern to search for'],
            'file_type' => ['type' => 'string', 'description' => 'File extension filter: php, js, css'],
            'path' => ['type' => 'string', 'description' => 'Subdirectory to search in'],
            'max_results' => ['type' => 'integer', 'description' => 'Max matching lines (default 50)'],
        ],
        'required' => ['pattern'],
    ];
}

The tool uses PHP's RecursiveDirectoryIterator + preg_match(). Not as fast as ripgrep, but zero external dependencies.

Why Agents Grep

The most common agent workflow before patching:

  1. "Fix the N+1 in InvoicesController" → agent reads the controller
  2. Agent identifies the query that needs eager loading
  3. Agent greps for all callers of the affected method
  4. Agent verifies no other code depends on the current behavior
  5. Agent generates the patch
Step 3 is critical. Without it, the agent fixes the N+1 but breaks a caller that expected lazy loading. The grep_codebase tool makes this possible.

Truncation

Results are capped at ~6K characters (configured in AgentExecutor's smart truncation). Grep results are repetitive by nature — 50 matches of the same pattern show the same code structure. The first matches (6K worth) give enough context; the rest adds noise.


DatabaseSchemaTool

Read-only database introspection:

public function execute(array $params): array
{
    $action = $params['action'] ?? 'list_tables';

    return match ($action) {
        'list_tables' => $this->listTables(),
        'describe' => $this->describeTable($params['table']),
        'indexes' => $this->showIndexes($params['table']),
        default => ['success' => false, 'error' => 'Unknown action'],
    };
}

Three actions:

  • list_tablesSHOW TABLES (what tables exist)

  • describeDESCRIBE table_name (columns, types, nullability, defaults)

  • indexesSHOW INDEX FROM table_name (index structure)


Read-only. No CREATE, ALTER, DROP, INSERT, UPDATE, DELETE. The tool can't modify the database — it can only inspect it. Agents that need to write SQL (like KnjigAgent) use dedicated domain tools that enforce business rules.

Why Schema Context Matters

When VajbCoder writes a new controller that queries pm_subtasks, it needs to know:

  • What columns exist (to write correct SELECT)

  • Which are nullable (to handle null in PHP)

  • What types they are (to add correct type hints)

  • What indexes exist (to write queries that use them)


Without schema context, the agent guesses column names — and guesses wrong. DatabaseSchemaTool turns "I think there's a status column" into "There's a status VARCHAR(20) NOT NULL DEFAULT 'todo', indexed."

The truncation limit for schema is 8K chars (higher than grep's 6K) because schema completeness matters more than search result completeness.


Up Next

Next up: GeneratePatchTool: The Crown Jewel — From AI Suggestion to Atomic File Write — the tool that turns AI suggestions into actual code changes, with PatchValidator's 6-step security check and PatchApplier's atomic write + git branch isolation.

Comments (0)

No comments yet. Be the first to share your thoughts!

Leave a Comment

Recent Posts