BORNAsecurity: A Web Application Firewall in Pure PHP

Every HTTP request to Shinobi Apps passes through a zero-dependency PHP WAF: threat scoring with pre-compiled patterns, behavioral bot detection, honeypot traps, progressive IP blocking, and real-time webhook alerts — all before the controller sees the request.

The previous post covered the core security layer — CSRF, session fingerprinting, rate limiting, and authentication. That layer handles the mechanics of "is this a legitimate session making a legitimate request?"

BORNAsecurity sits above it and asks a different question: "is this request a threat?"

It's a web application firewall implemented as a PHP middleware addon. No external libraries, no cloud WAF subscription, no nginx rules — everything runs inside the application itself. The tradeoff is that it's not as fast as a network-layer filter, but it has access to application context that a network-layer filter doesn't: session state, authenticated user status, known admin routes, and the full request object.

Here's what it actually does.

The Middleware Pipeline

BornaMiddleware runs at priority 25 in the middleware stack, before the core RateLimiterMiddleware at priority 30. This ordering matters: BORNA checks the accumulated rate limit violation history from past requests and can block an IP before the current request even hits the rate limiter.

For every non-asset, non-whitelisted request, the middleware runs through six stages in sequence:

  1. Check if IP is currently blocked
  2. Honeypot detection
  3. Progressive blocking (based on accumulated violations)
  4. Behavior analysis (bot detection)
  5. GeoIP enrichment
  6. Threat scoring
Asset requests (.css, .js, .jpg, etc.) are skipped entirely. The CSP report endpoint is also whitelisted — blocking your own violation reports would be counterproductive. Localhost and ::1 are always whitelisted. Configured IPs and database-whitelisted IPs are also skipped.

The IP block check and violation count come from a single batch query:

$result = $this->db->queryAndFetchAssoc(
    "SELECT
        (SELECT COUNT(*) FROM security_borna_blocked_ips
         WHERE ip = ? AND (expires_at IS NULL OR expires_at > NOW())) as is_blocked,
        (SELECT COUNT(*) FROM security_borna_rate_violations
         WHERE ip = ? AND violation_time >= DATE_SUB(NOW(), INTERVAL 60 MINUTE)) as violation_count",
    [$ip, $ip]
);

One round-trip. Both pieces of information arrive together and are cached for the duration of the request.

ThreatScorer: Pattern Matching With a Short-Circuit

ThreatScorer is the engine that evaluates inbound requests against a library of attack signatures. It has seven built-in CORE_PATTERNS:

| Rule | Type | Score | Block |
|------|------|-------|-------|
| SQL Injection | parameter | 100 | yes |
| Command Injection | parameter | 95 | yes |
| Path Traversal | path | 90 | yes |
| File Inclusion | parameter | 88 | yes |
| XSS | parameter | 85 | yes |
| Malicious User-Agent | user_agent | 60 | no |
| Suspicious Headers | header | 40 | no |

Score 100 means the analysis stops immediately — there's no point evaluating further patterns once you've found a SQL injection attempt. The final score is capped at 100.

// Early termination for critical threats
if ($score >= 100) {
    break;
}

Beyond the built-in rules, operators can add custom rules via the database. These are loaded with a static 5-minute cache — a static class property shared across all ThreatScorer instances, so multiple requests within the same process don't each hit the database for the same data:

private static ?array $staticRulesCache = null;
private static int $staticRulesCacheTime = 0;
private const RULES_CACHE_TTL = 300; // 5 minutes

Known crawlers are whitelisted. Googlebot's user agent contains the string "bot". The malicious_agents rule pattern matches anything with "bot" that isn't Google/Bing/Yahoo/Baidu. Without the crawler exemption, Googlebot would score 60 on every single visit. The fix:

// In checkUserAgent()
foreach (self::KNOWN_CRAWLERS as $crawlerPattern) {
    if (@preg_match($crawlerPattern, $userAgent)) {
        return null; // Skip user_agent rules entirely for known crawlers
    }
}

Legitimate admin paths are also exempted from path-type rules. The path_traversal rule fires on patterns like ..\/ — but it also fires on any URL containing /admin. If your application has a real /admin dashboard, that rule would flag every admin page load. The isLegitimateAdminPath() method reads a configurable list of regex patterns from borna.legitimate_admin_paths config, and skips path-type rules for matching routes.

Auth endpoints skip POST body scanning. Login forms contain passwords. Scanning POST bodies for injection patterns on login/register/password-reset endpoints would mean running regex patterns over password strings, which is both useless (the database layer uses prepared statements) and a potential source of false positives. borna.scan_post_body defaults to false; when true, it's overridden to false for auth endpoints regardless.

The auto-block thresholds are also context-sensitive: 75 for unauthenticated requests, 95 for authenticated ones. Someone who's already logged in via CSRF-protected form + session fingerprinting gets more benefit of the doubt than an anonymous request.

Honeypot Traps

HoneypotService maintains a list of URLs that no legitimate user should ever visit:

private const DEFAULT_HONEYPOTS = [
    '/wp-admin'     => 'WordPress Admin (fake)',
    '/wp-login.php' => 'WordPress Login (fake)',
    '/phpmyadmin'   => 'phpMyAdmin (fake)',
    '/admin'        => 'Admin Panel (fake)',
    '/.env'         => 'Environment File (fake)',
    '/.git/config'  => 'Git Config (fake)',
    '/config.php'   => 'Config File (fake)',
    '/backup.sql'   => 'SQL Backup (fake)',
    '/shell.php'    => 'Web Shell (fake)',
    '/xmlrpc.php'   => 'XML-RPC (fake)',
];

This application runs no WordPress, no phpMyAdmin, no .env file under the web root. Any request to these paths is either a scanner or a bot working from a generic target list.

Custom traps can be added via the database and are merged with defaults at construction time.

The response to a honeypot hit is 404 Not Found, not 403 Forbidden. A 403 tells the requester something is there but blocked. A 404 reveals nothing.

Auto-blocking triggers on the second honeypot hit within an hour:

private function shouldAutoBlock(string $ip): bool
{
    $hitCount = HoneypotModel::getIPHitCount($ip, 1); // last 1 hour
    return $hitCount >= 2;
}

The first hit is logged. The second hit blocks the IP. This prevents false positives from misconfigured crawlers that might stumble onto one trap URL by accident.

BehaviorAnalyzer: Six Signals for Bot Detection

BehaviorAnalyzer tracks session behavior across requests and builds a bot score from six independent signals:

1. User-Agent analysis — empty UA scores 40, bot-pattern UA scores 30 per match, very short UA (< 20 chars) scores 20. Capped at 50. Known crawlers return 0 immediately.

2. Request timing — examines intervals between the last 5 requests. If more than 50% of intervals are below 100ms, that's +25. If the standard deviation of intervals is below 50ms (mechanical regularity), that's +20. Humans are irregular; bots are not.

// Check for robotic regularity (low variance)
if (count($intervalValues) >= 3) {
    $variance = $this->calculateStdDev($intervalValues);
    if ($variance < 50) {
        $score += 20;
    }
}

3. Request rate — above 60 requests per minute, score increases proportionally (up to 30 points).

4. Navigation pattern — accessing many different pages without ever revisiting one is scanning behavior (+15). Accessing 2+ sensitive paths like /admin or /.git from a session is targeted probing (+20).

5. HTTP headers — missing both Accept-Language and Accept-Encoding is unusual for a browser (+15). Presence of X-Forwarded-For, X-Real-IP, or X-Scanner adds +10 each, capped at 30.

6. JavaScript execution — browsers execute JavaScript; bots typically don't. If a session has made 3+ requests but hasn't triggered the JS verification endpoint, it scores +15 to +20. After 10+ requests with no JS verification, the case is stronger.

The bot threshold is 60 for unauthenticated requests, 70 for authenticated users. Someone who authenticated through a CSRF-protected login form gets more latitude — authenticated users skip timing checks, JS verification checks, and GeoIP lookups entirely.

Auto-block fires when the bot score reaches 80, with a 1-hour temporary block.

The entire analysis returns to default (score: 0, is_bot: false) for known crawlers — they bypass the analysis before any signals are evaluated.

Progressive Blocking

Three escalating levels based on rate limit violation history in the past 60 minutes:

| Violations in 60 min | Action |
|---------------------|--------|
| 3–4 | Block for 1 hour |
| 5–9 | Block for 24 hours |
| 10+ | Permanent block (no expiry) |

The violation count comes from the batch checkIPStatus() query, so progressive blocking costs no additional database query. The level check cascades top-down — level 3 is checked first, so an IP with 12 violations gets a permanent block, not a 24h block.

// Level 3 first (most severe)
if ($violationCount >= 10) {
    $this->blockIP($ip, $reason, null, 'progressive-block-l3'); // null = permanent
    return ['blocked' => true, 'reason' => $reason, 'duration' => null];
}

A null expiry in security_borna_blocked_ips means the expires_at IS NULL OR expires_at > NOW() check in the block query always matches — the block doesn't expire.

BornaAlertManager: Webhooks With Deduplication

Every significant security event can trigger a webhook alert to Slack, Discord, Teams, or any custom endpoint (via the framework's WebhookDispatcher).

Alert deduplication prevents alert storms. Each alert generates a signature from the type, IP, path, and attack type:

private function createAlertSignature(string $type, array $data): string
{
    $keyData = [
        $type,
        $data['ip'] ?? '',
        $data['path'] ?? '',
        $data['attack_type'] ?? ''
    ];
    return md5(implode('|', $keyData));
}

If the same signature was sent within the 5-minute deduplication window, the alert is suppressed. A DDoS attack generating hundreds of blocks per minute produces one alert per 5 minutes, not hundreds.

A configurable minimum severity threshold filters low-value alerts. Only high and critical events alert by default.

AI Integration: Dispatching Events for LIMAi

Every security action dispatches a typed event through the framework's EventDispatcher. The event payloads are consistent — every event includes ip, path, user_agent, severity, auto_blocked, and context-specific fields.

Events fired:

  • borna.honeypot_triggered — honeypot accessed
  • borna.progressive_block — IP blocked by escalating violations
  • borna.bot_blocked — aggressive bot auto-blocked (score ≥ 80)
  • borna.bot_detected — bot detected but not blocked (score 50–79)
  • borna.attack_blocked — threat score exceeded block threshold
  • borna.threat_detected — threat score 30–74 (logged, not blocked)
  • borna.rate_limit_exceeded — downstream 429 caught after response
The LIMAi addon's BornaAIWatcher subscribes to these events and generates rule-based detection records — no AI API calls, just deterministic logic that surfaces findings in the LIMAi dashboard and can route to the @bornagent AI agent for deeper analysis on demand.

The Full Picture

At request time, a typical unauthenticated page load runs through:

  • One batch SQL query (IP block status + violation count)

  • One honeypot path check (array lookup)

  • A behavior analysis update (session tracking, 2–3 queries)

  • A GeoIP lookup (cached per IP)

  • ThreatScorer pattern matching across 7 rule categories


None of this is visible to the controller. If all checks pass, $next($request) runs and the request proceeds normally. If any check fires, the request gets a 403 or 404 before reaching any business logic.

The entire addon — middleware, scoring engine, honeypots, behavior analysis, alert system — is about 3,000 lines of PHP with no Composer dependencies.

What's Next

The next post covers LUKAmonitoring: a performance observability addon that hooks into every database query, tracks N+1 patterns, measures per-endpoint response times, detects memory leaks, and surfaces anomalies in real time — all from inside the application itself.

Comments (0)

No comments yet. Be the first to share your thoughts!

Leave a Comment

Recent Posts