Last updated: 2026-06-01 16:41 UTC

How Stylobot Works

Stylobot sits in front of your web app, runs detection on every request, and applies an action policy. This page covers what's actually happening when a request lands.

Request lifecycle

client ──> stylobot ──> [verdict gate] ──> [wave pipeline] ──> [aggregator] ──> [policy] ──> upstream
                              │                  │                                │
                              SKIP / BIAS / MISS │   contributors run in parallel │
                                                 │   inside each wave; an early   │
                              └──> evidence + signature ──────────────────────────┘
                                             │
                                             └──> dashboard + /api/v1
  1. Verdict gate (per request). Before any detector runs, the gate checks the fingerprint's recent verdict. Three outcomes:
    • Skip — high-confidence recent verdict, no pipeline. The cached classification is enforced; the response carries X-StyloBot-VerdictSource: cache (or identity-cache when the per-fingerprint cache won over the per-signature cache). The dashboard still records the observation. Most repeat traffic takes this path.
    • Bias — a usable verdict exists but does not meet the Skip thresholds. The pipeline runs, but the cached verdict seeds the prior so contributions move it incrementally instead of starting from zero.
    • Miss — no usable verdict. Full pipeline.
  2. Wave pipeline. When the pipeline runs, contributors are grouped into waves by manifest priority. All contributors in a wave run concurrently (not sequentially). Each contributor reads the current Blackboard, emits a contribution (signal values, confidence delta, reason, optional bot type/name), and may write derived signals that later waves consume.
  3. Early exit. A contributor can emit a verifying early-exit verdict (VerifiedGoodBot, Whitelisted, VerifiedBadBot, Blacklisted). The orchestrator short-circuits the remaining waves immediately. FastPathReputation, VerifiedBot, and Honeypot are the typical exit-emitters.
  4. Aggregation produces a verdict. Surviving contributions combine into:
    • botProbability (0.0–1.0) — weighted sum, learning-tuned
    • confidence (0.0–1.0) — quality / consistency of the evidence
    • riskBandVeryLow, Low, Elevated, Medium, High, VeryHigh (per-request)
    • threatBandNone, Low, Medium, High, VeryHigh (per-fingerprint cross-request)
    • riskProfile — narrative summary, friendly-pin marker when applicable
  5. Policy maps verdict to action. The configured action policy converts the verdict into a concrete behaviour: Allow, Throttle, Challenge, Block, MaskPii, RedirectHoneypot.
  6. Audit trail is retained. Every request keeps its full trace: raw signals, derived signals, which contributors ran, their contributions, the policy mapping. The dashboard and /api/v1 surfaces read from this same trace.

Wave structure

Wave assignment is declared in the contributor's YAML manifest (Priority and DependsOn); the orchestrator topologically sorts dependents into later waves so signals are always available before consumers run.

Wave Typical contributors What's available
Wave 0 (sub-millisecond) Signature, FastPathReputation, Whitelist, Honeypot tagger Raw request only. Early-exit-eligible.
Wave 1 (cheap synchronous) UserAgent, Header, IP/Geo, Transport (TLS/H2/H3), TCP-p0f, IdentityVector, SessionVector seed Raw request + Wave 0 signals.
Wave 2 (correlation) FingerprintMatch, Behavioural, ContentSequence, ResourceWaterfall, CookieBehavior, Inconsistency correlator Wave 1 derived signals.
Wave 3 (model + intelligence) HeuristicEarly model, Cluster lookup, IntentScoring, ReputationBias, VerifiedBot DNS verify Full aggregated evidence so far.
Wave 4 (escalation) HeuristicLate, ClientSide probe verify, FingerprintApproval, ChallengeVerification, optional LLM Everything. Last chance to amend before aggregation.

Concretely: a TLS-fingerprint contributor and a UA contributor in the same wave both look at the same Blackboard state and emit independently. They don't see each other's contribution. Their signals become visible to wave-2 consumers (a HeaderCorrelation contributor reading both transport.tls_ja4 and hdr.sec_ch_ua to flag mismatch, for example).

What contributors look for

Each contributor is independently configurable (enable/disable, weight, threshold) under BotDetection:Detectors:<Name> in appsettings.json or via the YAML manifest under Orchestration/Manifests/detectors/.

Contributor What it looks for
User-Agent Known bot patterns, malformed UAs, impossible browser identifiers. Easy to spoof, so a single hit is never decisive.
Header Protocol/header patterns that diverge from a real browser: missing expected headers, abnormal Accept combinations, inconsistent sec-ch-*.
Adblocker / privacy Sec-GPC, DNT, and the Brave / Firefox-privacy / Chrome-privacy identity archetypes treat opt-in privacy signals as positive human evidence. Bots don't volunteer GPC. See adblocker detection below.
IP / network Datacenter and cloud ranges, known proxy/VPN traits, threat-feed reputation (Spamhaus DROP, Tor exits, CISA KEV).
Transport TLS JA3/JA4 fingerprint, HTTP/2 settings hash, ALPN, TCP p0f. Catches "Chrome UA on a Python TLS stack" instantly.
Identity / fingerprint match Encodes the request as a 129-dim vector, finds the nearest archetype (brave, googlebot, etc.), pulls cross-request behavioural state from prior visits with the same fingerprint.
Behavioural Request cadence and navigation shape over time: bursty rates, repetitive endpoint sweeps, sub-human timing.
Advanced behavioural Higher-order shape: traversal strategies, sequence anomalies, sustained drift.
Cache behaviour Cache-busting query churn, deliberate anti-cache patterns.
Security-tool fingerprints Known offensive scanner signatures (nikto, sqlmap, masscan, etc.).
Client-side fingerprinting Optional. Headless / webdriver traits, browser capability inconsistencies, signed-token validity.
Version age Outdated or impossible browser/OS combinations common in automation stacks.
Reputation bias Per-pattern history. ConfirmedBad and ManuallyBlocked propagate MaliciousBot classification; Suspect only contributes a probability delta — it does not overclaim a specific bot type.
Reputation fast-path Wave 0 re-classification of known signatures. Low-latency check for repeat visitors that escapes the verdict gate.
AI / heuristic classifier Feature-vector model (Wave 3 HeuristicEarly, Wave 4 HeuristicLate). Optional LLM escalation for ambiguous cases. Off by default; enable via BotDetection:AiDetection:Provider.

Adblocker detection (positive human signal)

Bots don't install privacy extensions. They don't toggle Sec-GPC. They don't add DNT. They have no opinion about being tracked, because they aren't human.

Stylobot treats opt-in privacy headers as a positive human-affinity signal in two places:

  1. Identity archetypes. The brave-desktop, brave-mobile, chrome-privacy, and firefox-privacy archetypes assert Sec-GPC=true (and DNT=true for the Privacy Badger slice) at high confidence. A request carrying those headers pulls toward a human-browser archetype centroid in vector space instead of into the googlebot region (which also lacks Sec-Fetch headers, but never asserts GPC). Without these archetypes, privacy-stripped Chrome and Firefox users were absorbed into googlebot — they "looked like" googlebot on every dimension except the one nobody was measuring.
  2. Client-side adblocker probe (commercial TagHelper). For sites with an ad-revenue model, a <sb:adblock-probe> TagHelper drops a JS probe into the page that attempts to load an ad-network resource (or any URL on common filter lists). A blocked fetch is reported via beacon and suppresses the no-fingerprint penalty in ClientSideContributor. Catches Pi-hole and DNS-sinkhole users that browser-extension probes miss.

Both mechanisms are additive: a Brave user gets the header-based archetype boost and, if the page is ad-supported, the probe confirms the adblocker layer too.

Privacy tool What it sends Detection path
Brave (default) Sec-GPC: 1 + standard Chromium Sec-Fetch / UIR brave-desktop / brave-mobile archetype
DuckDuckGo browser+extension Sec-GPC: 1 chrome-privacy archetype
Privacy Badger Sec-GPC: 1 + DNT: 1 firefox-privacy archetype (DNT slice)
Ghostery Sec-GPC: 1 chrome-privacy archetype
uBlock Origin No request-side header changes (response-only) Client-side probe (commercial)
Pi-hole / DNS sinkhole Transparent to browser Client-side probe (commercial)

The full encoder slot list (including hdr.dnt and hdr.sec_gpc) lives in IdentityVectorLayout.DefaultV1() in the FOSS package.

Cross-request layer

Single requests give weak signals. Stylobot also runs cross-request analysis:

  • Signature matching. A visitor's multi-vector signature is fuzzy-matched against earlier signatures, so behavioural state accumulates per visitor instead of per request.
  • Cluster detection. Confirmed bot signatures (probability >= 0.5) are clustered every 60s or per-20-new-bots, whichever first. Label propagation on a 12-d feature vector discovers Bot Product clusters (same software from different IPs) and Bot Network clusters (coordinated campaigns). FFT cross-correlation surfaces shared timing patterns (C2 heartbeats, cron-driven sweeps).
  • Country reputation. Per-country bot rates are tracked with exponential decay (default time constant: 168h). Countries need a minimum sample size before their reputation reports, so low-traffic origins don't get noise-flagged. Reputation contributes a supporting signal, never a blocking one on its own.

Confidence vs probability

Two numbers, not one, because they answer different questions:

  • High probability + high confidence: strong basis for Challenge or Block.
  • High probability + low confidence: suspicious, but Throttle or Challenge is safer until evidence firms up.
  • Low probability + high confidence: typically safe Allow.

High suspicion is not the same as a strong basis for blocking. Use confidence to control how aggressive the action is, not whether to act.

Fast path vs heavier analysis

Fast-path detectors are inline, low-latency, and on the request hot path. Heavier analysis (AI classifier, cluster rebuild, FFT correlation, reputation recompute) is scheduled, never on the hot path. The dashboard's "Processing Time" widget shows Stylobot's added latency separately so you can see exactly how much your detection pipeline costs.

Practical tuning loop

  1. Run with --mode demo. Watch the dashboard. Read the top reasons for each verdict.
  2. Identify the false-positive shape: which detectors contribute, which signal families dominate.
  3. Tune detector weights, thresholds, or per-route policy in appsettings.json.
  4. Roll out stronger actions gradually by risk band: Throttle first, Block for the bands you trust.

More