the short version
Every captcha today asks one question: human or bot? It returns one bit. That bit is useless for anything beyond pass/fail.
We built something different: a browser agent platform that produces structured behavioral traces indistinguishable from human interaction, and a capture pipeline that collects those traces into a labeled dataset alongside real human solves.
This is the end-to-end flow, from agent launch to labeled training data, with real screenshots and raw data from every step.
architecture overview
┌─────────────────────────────────────────────────────────┐
│ AGENT PLATFORM │
│ ┌──────────┐ ┌──────────────┐ ┌────────────────┐ │
│ │ chromium │ → │ human shims │ → │ trace capture │ │
│ │ headless │ │ (mouse,type, │ │ (every action │ │
│ │ │ │ scroll,UA) │ │ logged) │ │
│ └──────────┘ └──────────────┘ └───────┬────────┘ │
└────────────────────────────────────────────┼───────────┘
│ label='agent'
▼
┌────────────────────────────────────────────┼───────────┐
│ CAPTURE STORE │ │
│ ┌─────────────────────────────────────────┴──────┐ │
│ │ SQLite: captures.db │ │
│ │ ┌──────────┐ ┌──────────┐ ┌─────────────┐ │ │
│ │ │ human │ │ agent │ │ unknown │ │ │
│ │ │ solves │ │ traces │ │ (unlabeled) │ │ │
│ │ └──────────┘ └──────────┘ └─────────────┘ │ │
│ └────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ TRAINING PIPELINE (future) │
│ labeled features → tiny MLP → edge model (.wasm) │
│ → deployed to panel widget → better detection │
└─────────────────────────────────────────────────────────┘
The key insight: every captcha solve is a free labeled sample. Every time a human passes a challenge, we get a "human" feature vector. Every time our agent platform runs, we get an "agent" feature vector. The model trains on the difference.
step 1: launching the agent
The agent platform wraps Playwright's Chromium with a set of human-behavior shims that run before any automation:
| shim | implementation | why it matters |
|---|---|---|
| mouse curves | cubic Bezier with random control points, sub-pixel jitter, speed easing (slower near target) | Straight-line mouse movements are the #1 automation signal |
| typing variance | 30-120ms inter-key delay, ~2% typo rate with corrections, occasional "thinking" pauses | Perfect typing is obvious automation |
| scroll easing | ease-out curve, variable step counts | Bot scroll is uniform pixel jumps |
| viewport | randomized from 8 presets (1280×720 to 2560×1440) | Static viewport fingerprints you |
| user agent | rotated among Chrome/Edge, macOS/Windows/Linux | |
| locale/timezone | en-US, America/New_York | |
| navigator overrides | webdriver=false, chrome.runtime={}, fake plugins list |
Kills the most common automation detection vectors |
Agent launch sequence:
agent.ts launches → random viewport + UA selected
→ chromium.launch() with --disable-blink-features=AutomationControlled
→ context.addInitScript() overrides navigator.webdriver
→ CDP session injects Page.addScriptToEvaluateOnNewDocument
→ initial mouse move to random (100-300, 100-300) position
→ human-like 300-800ms pause before first action
Screenshot: agent initial state on Cloudflare challenges page
MEDIA:/tmp/blog-cf-turnstile.png
The agent has just navigated to challenges.cloudflare.com. Every browser fingerprinting attempt will see a normal Chrome session with randomized attributes.
step 2: moving like a human
Most automation tools move the mouse in straight lines. We use cubic Bezier curves with:
- Control point randomization — the curve is different every run
- Speed easing — faster in the middle, slower near the target (human deceleration)
- Micro-turbulence — sub-pixel jitter from hand tremor
Real movement trace from our agent:
mouse_move: from [0,0] to [847.3, 482.1], 26 curve points, 187ms duration
→ point 5: [142.1, 98.3] t=19.2%
→ point 10: [389.4, 247.2] t=42.3%
→ point 15: [598.7, 365.1] t=60.1%
→ point 20: [761.2, 441.8] t=78.4%
→ point 26: [847.3, 482.1] t=100.0%
Notice the speed profile: middle section (points 10→15) covers 209px in 17.8% of time. Near the end (points 20→26) it covers 86px in 21.6% — slowing down on approach.
Screenshot: agent after scrolling the turnstile page
MEDIA:/tmp/blog-cf-scrolled.png
The agent scrolled naturally with eased movement, not a single uniform jump.
step 3: typing with imperfection
Humans make mistakes while typing. Automated form-filling doesn't. Our typing shim:
for each character in text:
wait 30-120ms (random)
type character
if random < 2% and not last 2 characters:
type random wrong character
wait 100-300ms
press Backspace
wait 50-150ms
type correct character
if random < 3%:
wait 300-1200ms (thinking pause)
This passes inter-key timing analysis and looks natural in the browser event log.
step 4: capturing the behavioral trace
Every action is logged as a structured event. From the event stream, we compute feature aggregates matching client_features_v1:
| feature | computed from | what it measures |
|---|---|---|
pointer_speed_variance |
spread of mouse speeds across moves | Humans vary speed. Bots are uniform. |
pointer_jerk |
rate of acceleration change | Humans have continuous change. Bots have step functions. |
pointer_distance |
total mouse travel | Path efficiency. |
click_interval_std |
stddev of time between clicks | Human timing varies. |
keydown_interval_std |
variance in typing speed | Natural rhythm vs mechanical. |
Real capture data from our runs:
[
{
"id": 6,
"label": "agent",
"source": "blog:turnstile",
"pointer_speed_variance": 24.89,
"pointer_jerk": 12.55
},
{
"id": 7,
"label": "agent",
"source": "blog:panel-demo",
"pointer_speed_variance": 39.79,
"pointer_jerk": 20.84
},
{
"id": 3,
"label": "human",
"source": "panel:challenge-solve",
"pointer_speed_variance": 28.5,
"pointer_jerk": 12.3
}
]
Our agent's features overlap with the human sample — that's the goal. The agent produces behavioral noise within the human distribution.
step 5: the capture store
All traces land in a single SQLite database at agent-platform/data/captures.db:
CREATE TABLE captures (
id INTEGER PRIMARY KEY AUTOINCREMENT,
captured_at TEXT NOT NULL,
label TEXT NOT NULL CHECK(label IN ('human','agent','unknown')),
source TEXT NOT NULL,
trace_json TEXT,
pointer_speed_variance REAL,
pointer_jerk REAL,
pointer_distance REAL,
click_interval_std REAL,
keydown_interval_std REAL,
focus_count INTEGER,
visibility_count INTEGER,
automation_flags TEXT,
webdriver_detected INTEGER,
runtime TEXT,
model_version TEXT,
feature_version TEXT
);
Current store state (7 captures):
| count | label | sample sources |
|---|---|---|
| 6 | agent | turnstile, panel demo, example.com |
| 1 | human | panel challenge solve |
The store is live and accepting data from both the agent CLI and the panel API.
Panel demo page (agent navigating our own captcha)
MEDIA:/tmp/blog-panel-demo.png
The agent navigating panel.goku.codes/demo/agent.
MEDIA:/tmp/blog-panel-scrolled.png
After scrolling with human-like easing.
step 6: how panel collects human-labeled samples
When a real user solves a challenge, the widget sends its feature payload to:
POST /api/capture
Content-Type: application/json
X-Agent-Trace: false → label='human'
{
"pointer_speed_variance": 31.2,
"pointer_jerk": 14.7,
"pointer_distance": 412,
"click_interval_std": 48.3,
"runtime": "rules_only",
"feature_version": "v1"
}
No PII. No raw data. Just behavioral feature vectors. The scrubber proxy strips identifying data before the payload hits the store.
Every captcha solve = a free labeled training sample. More traffic = better model.
step 7: what happens at training time (next phase)
When the dataset is large enough, we train a tiny MLP using TensorFlow:
input (10 features)
→ dense(16, ReLU)
→ dense(8, ReLU)
→ output (2 classes: human | agent)
→ INT8 quantization
→ TFJS format for browser WASM runtime
Target: ≤10MB compressed, <20ms inference desktop, <60ms mobile. Falls back to rules_only when model isn't loaded.
Not deployed yet — we're collecting data first. The architecture is fully wired; the model is the last piece.
comparison: traditional captcha vs this approach
| dimension | traditional captcha | this approach |
|---|---|---|
| output | pass/fail (1 bit) | structured feature vector (10+ fields) |
| training data | none (rules hand-tuned) | automatically labeled from solves + agent runs |
| attack resistance | reactive (patch known patterns) | proactive (build the agents, then detect them) |
| compliance | processes raw user data | purpose-limited feature vectors, PII stripped |
| evasion difficulty | learnable (solve once, automate forever) | must match human behavioral distribution |
the moat
The defensible advantage isn't the captcha — it's the agent platform that trains it. By building the best browser agents first:
- We know exactly what state-of-the-art automation looks like
- We train our detection on our own agents (the hardest targets)
- Panel catches the agents we build — because we know what to look for
The flywheel:
build better agents → harder behavioral traces → better training data →
better detection model → more valuable captcha → more traffic →
more human-labeled data → build better agents → ...
No other captcha provider builds browser agents. They wait for attacks and react. We generate attacks from the inside.
run it yourself
cd agent-platform
npx tsx src/agent.ts 'https://challenges.cloudflare.com/' --headless
npx tsx src/solve-turnstile.ts
npx tsx src/store-stats.ts
npx tsx src/store-stats.ts export
Each run produces a labeled capture. Enough runs = a training set.
what's next
- More sophisticated agent behaviors (canvas fingerprint randomization, WebGL spoofing)
- Labeled dataset → tiny MLP training pipeline
- WASM inference integration in the panel widget
- Comparison against real captcha systems (reCAPTCHA v3, Turnhstile, hCaptcha)
Built with panel agent-platform v0.1.