CaptchaOCR.

CaptchaOCR is a lightweight FastAPI micro-service that does exactly one thing: receive a 4-digit captcha image in base64, return an integer. The ddddocr pretrained model loads once at startup, leaving every subsequent request as pure inference — latency in the tens of milliseconds range, fast enough to slot into any automation pipeline without becoming a bottleneck. Deployed via PM2 with auto-restart and a hard RAM cap, it runs quietly in the background with no babysitting required.

Year2025

RoleSolo · python

Timeline1 week

Status● Live · self-hosted

Repoprivate

Source code ↗

01.Overview

The big picture

CaptchaOCR came out of building automation projects for clients — almost every site that needed automated login or account registration had one unavoidable step: solving a captcha. tesseract topped out at around 70% accuracy and depended heavily on having the right binary version installed on the server; training a custom CNN meant building a large enough dataset — a project in itself. I needed something simpler that could be dropped in immediately: one HTTP endpoint, takes base64, returns an integer.

The final solution: FastAPI accepts a POST with base64, ddddocr pretrained on ONNX runtime handles the classification, a regex pulls the first 4 digits, and the service returns an int. The model loads once at startup, leaving every subsequent request as pure inference. Deployed via PM2 with max_memory_restart 1G. Accuracy measured across 1,000 real captchas: ~96%.

02.Features

What it does

01
POST /ocrbase64 in · int out · 422 on fail
The service exposes a single endpoint: POST /ocr accepts a { image_base64 | text } body, runs OCR, and returns { number: int } when 4 digits are successfully read. If the base64 is malformed or OCR can't find enough characters, the service returns 422 with a specific detail message — no silent failures, no guessed results. A health check lives at GET / for monitoring to confirm the service is alive.
02
Pretrained ddddocrNo training · model loads once
ddddocr 1.5.6 is a pretrained CNN model running on ONNX runtime, purpose-built for alphanumeric captchas. The engine is initialised at module level — loaded exactly once when the process starts, never reloaded per request. Every subsequent call incurs only pure inference cost, keeping latency consistent regardless of traffic volume.
03
Forgiving parsingdata-url or raw · regex extract digits
_decode_data_url handles both input formats — data:image/png;base64,… and raw base64 — validating before decoding to catch errors early rather than letting them surface deep inside the pipeline. _extract_4_digit_captcha runs a \d regex over the OCR output and slices out exactly the first 4 digits — a deliberate choice that tolerates the stray special characters the model occasionally returns alongside the result.
04
PM2 deployautorestart · max_memory_restart 1G
PM2 is configured in eco.config.js to run the service using Python from a virtualenv, single instance with autorestart: true to recover from unexpected crashes. max_memory_restart: "1G" acts as a hard ceiling against memory leaks from the ONNX runtime accumulating over time — rather than letting the service bloat RAM indefinitely, PM2 restarts it and reclaims memory automatically. The ORT_LOGGING_LEVEL=3 environment variable keeps logs at error-only level, preventing the ONNX runtime from flooding output with unnecessary verbose lines.
05
Typed schemaPydantic · auto OpenAPI · Swagger UI
Requests and responses are typed with Pydantic v2 — FastAPI automatically generates OpenAPI docs at /docs and /redoc with no manual effort. The image_base64 field supports an additional text alias so older clients sending payloads under the legacy field name continue to work without any changes on their end.

03.Tech stack

Tools used

API	FastAPI 0.115 · Uvicorn 0.30 · Pydantic v2
OCR	ddddocr 1.5.6 (pretrained CNN · ONNX Runtime backend) · no training, no GPU
Runtime	Python 3.13 · venv · single process · model loaded once at module import
Deploy	PM2 (eco.config.js) · autorestart · max_memory_restart 1G · ORT_LOGGING_LEVEL=3 · port 8000
I/O	Base64 (data URL or raw) → bytes → OCR → regex \d → 4-digit int
Footprint	3 dependencies · 67 lines of Python · ~150MB steady RAM · ~30ms per request

04.How it works

Architecture

The service sits between two very different things: a pretrained ML library (ddddocr, wrapping ONNX Runtime) and a simple HTTP client. FastAPI's role here covers exactly three concerns: lifecycle (model loaded once at module level), validation (Pydantic + try/except for base64), and contract (a 4-digit integer). Everything else — batching, queuing, async — isn't needed at this scale.

Each request passes through exactly 4 steps: (1) _decode_data_url strips the data:image/png;base64, prefix if present and validates using b64decode(..., validate=True); (2) the image bytes go into ocr_engine.classification(...) — ddddocr runs inference in around ~25ms on CPU; (3) a \d regex collects all digits from the output, raising if fewer than 4 are found; (4) the first 4 digits are sliced and cast with int(...). Any failure at any step becomes an HTTP 422 with a clear message — nothing fails silently.

Design decision

CaptchaOCR solves exactly one case: numeric-only captchas. For mixed alphanumeric captchas, ddddocr accuracy drops noticeably and I'm still looking for a better approach. For now, when the endpoint returns 422 or produces a wrong result, the automation pipeline falls back to calling an AI to read the image — slower and more expensive, but enough to keep the pipeline running while a proper solution is still being worked out.

05.Comments

Leave a few words

No comments yet.

Xem tiep

MySQL Manager

A Flutter-based MySQL admin client — saved connections, browse databases → tables → rows, primary-key-safe CRUD, plus a free-form SQL shell; runs on Android, iOS, macOS, Linux and Windows from a single Dart codebase.

Hotel management

Hotel Management is an internal hotel administration system built for front desk staff and operations managers. The entire core workflow runs within a single system: accepting reservations from walk-in guests or online channels, handling check-in and check-out, recording ancillary services incurred during the stay, and automatically generating invoices at checkout. Permissions are scoped down to individual actions — receptionists, cashiers, and managers each have different screens and access levels. Every change in the system is written to a full audit log, with enough detail to trace back anything when needed.

Trading Signals

TradingSignals is a personal buy/sell signal tool for crypto and equities — not a mass copy-trading bot, but a system built for one person, backtested against 2–5 years of real market data. NestJS scans simultaneously across multiple timeframes from 15 minutes to 1 month, running in parallel through 4 independent engines: candlestick pattern, indicator, price action and volume analysis. A signal is only recorded when multiple timeframes converge — cutting through noise and surfacing only the entry points genuinely worth attention. After 10 days, the system looks back at each signal — right or wrong — gradually sharpening its accuracy the longer it runs. Results are pushed directly to a personal Telegram the moment a signal fires, with patterns rendered in real time on a TradingView chart embedded in Next.js.

Restaurant OS

A POS system built for multi-branch restaurants — customers scan a QR code at the table to browse the menu and place orders directly from their phone, no need to flag down a server. Orders are pushed in real time to the kitchen display and cashier station, with staff reviewing and updating the status of each item as it's served. When a table is ready to pay, the bill prints at the counter in a single action. The entire system — NestJS API, Next.js admin dashboard, and customer-facing ordering web — runs on a single shared backend, keeping data in sync across all branches in real time.