CaptchaOCR.
CaptchaOCR is a lightweight FastAPI micro-service that does exactly one thing: receive a 4-digit captcha image in base64, return an integer. The ddddocr pretrained model loads once at startup, leaving every subsequent request as pure inference — latency in the tens of milliseconds range, fast enough to slot into any automation pipeline without becoming a bottleneck. Deployed via PM2 with auto-restart and a hard RAM cap, it runs quietly in the background with no babysitting required.
01.Overview
The big pictureCaptchaOCR came out of building automation projects for clients — almost every site that needed automated login or account registration had one unavoidable step: solving a captcha. tesseract topped out at around 70% accuracy and depended heavily on having the right binary version installed on the server; training a custom CNN meant building a large enough dataset — a project in itself. I needed something simpler that could be dropped in immediately: one HTTP endpoint, takes base64, returns an integer.
The final solution: FastAPI accepts a POST with base64, ddddocr pretrained on ONNX runtime handles the classification, a regex pulls the first 4 digits, and the service returns an int. The model loads once at startup, leaving every subsequent request as pure inference. Deployed via PM2 with max_memory_restart 1G. Accuracy measured across 1,000 real captchas: ~96%.
02.Features
What it does- 01POST /ocrbase64 in · int out · 422 on fail
The service exposes a single endpoint:
POST /ocraccepts a{ image_base64 | text }body, runs OCR, and returns{ number: int }when 4 digits are successfully read. If the base64 is malformed or OCR can't find enough characters, the service returns422with a specific detail message — no silent failures, no guessed results. A health check lives atGET /for monitoring to confirm the service is alive. - 02Pretrained ddddocrNo training · model loads once
ddddocr 1.5.6is a pretrained CNN model running on ONNX runtime, purpose-built for alphanumeric captchas. The engine is initialised at module level — loaded exactly once when the process starts, never reloaded per request. Every subsequent call incurs only pure inference cost, keeping latency consistent regardless of traffic volume. - 03Forgiving parsingdata-url or raw · regex extract digits
_decode_data_urlhandles both input formats —data:image/png;base64,…and raw base64 — validating before decoding to catch errors early rather than letting them surface deep inside the pipeline._extract_4_digit_captcharuns a\dregex over the OCR output and slices out exactly the first 4 digits — a deliberate choice that tolerates the stray special characters the model occasionally returns alongside the result. - 04PM2 deployautorestart · max_memory_restart 1G
PM2 is configured in
eco.config.jsto run the service using Python from a virtualenv, single instance withautorestart: trueto recover from unexpected crashes.max_memory_restart: "1G"acts as a hard ceiling against memory leaks from the ONNX runtime accumulating over time — rather than letting the service bloat RAM indefinitely, PM2 restarts it and reclaims memory automatically. TheORT_LOGGING_LEVEL=3environment variable keeps logs at error-only level, preventing the ONNX runtime from flooding output with unnecessary verbose lines. - 05Typed schemaPydantic · auto OpenAPI · Swagger UI
Requests and responses are typed with Pydantic v2 — FastAPI automatically generates OpenAPI docs at
/docsand/redocwith no manual effort. Theimage_base64field supports an additionaltextalias so older clients sending payloads under the legacy field name continue to work without any changes on their end.
03.Tech stack
Tools used| API | FastAPI 0.115 · Uvicorn 0.30 · Pydantic v2 |
| OCR | ddddocr 1.5.6 (pretrained CNN · ONNX Runtime backend) · no training, no GPU |
| Runtime | Python 3.13 · venv · single process · model loaded once at module import |
| Deploy | PM2 (eco.config.js) · autorestart · max_memory_restart 1G · ORT_LOGGING_LEVEL=3 · port 8000 |
| I/O | Base64 (data URL or raw) → bytes → OCR → regex \d → 4-digit int |
| Footprint | 3 dependencies · 67 lines of Python · ~150MB steady RAM · ~30ms per request |
04.How it works
ArchitectureThe service sits between two very different things: a pretrained ML library (ddddocr, wrapping ONNX Runtime) and a simple HTTP client. FastAPI's role here covers exactly three concerns: lifecycle (model loaded once at module level), validation (Pydantic + try/except for base64), and contract (a 4-digit integer). Everything else — batching, queuing, async — isn't needed at this scale.
Each request passes through exactly 4 steps: (1) _decode_data_url strips the data:image/png;base64, prefix if present and validates using b64decode(..., validate=True); (2) the image bytes go into ocr_engine.classification(...) — ddddocr runs inference in around ~25ms on CPU; (3) a \d regex collects all digits from the output, raising if fewer than 4 are found; (4) the first 4 digits are sliced and cast with int(...). Any failure at any step becomes an HTTP 422 with a clear message — nothing fails silently.
CaptchaOCR solves exactly one case: numeric-only captchas. For mixed alphanumeric captchas, ddddocr accuracy drops noticeably and I'm still looking for a better approach. For now, when the endpoint returns 422 or produces a wrong result, the automation pipeline falls back to calling an AI to read the image — slower and more expensive, but enough to keep the pipeline running while a proper solution is still being worked out.
A Flutter-based MySQL admin client — saved connections, browse databases → tables → rows, primary-key-safe CRUD, plus a free-form SQL shell; runs on Android, iOS, macOS, Linux and Windows from a single Dart codebase.

Hotel Management is an internal hotel administration system built for front desk staff and operations managers. The entire core workflow runs within a single system: accepting reservations from walk-in guests or online channels, handling check-in and check-out, recording ancillary services incurred during the stay, and automatically generating invoices at checkout. Permissions are scoped down to individual actions — receptionists, cashiers, and managers each have different screens and access levels. Every change in the system is written to a full audit log, with enough detail to trace back anything when needed.

TradingSignals is a personal buy/sell signal tool for crypto and equities — not a mass copy-trading bot, but a system built for one person, backtested against 2–5 years of real market data. NestJS scans simultaneously across multiple timeframes from 15 minutes to 1 month, running in parallel through 4 independent engines: candlestick pattern, indicator, price action and volume analysis. A signal is only recorded when multiple timeframes converge — cutting through noise and surfacing only the entry points genuinely worth attention. After 10 days, the system looks back at each signal — right or wrong — gradually sharpening its accuracy the longer it runs. Results are pushed directly to a personal Telegram the moment a signal fires, with patterns rendered in real time on a TradingView chart embedded in Next.js.

A POS system built for multi-branch restaurants — customers scan a QR code at the table to browse the menu and place orders directly from their phone, no need to flag down a server. Orders are pushed in real time to the kitchen display and cashier station, with staff reviewing and updating the status of each item as it's served. When a table is ready to pay, the bill prints at the counter in a single action. The entire system — NestJS API, Next.js admin dashboard, and customer-facing ordering web — runs on a single shared backend, keeping data in sync across all branches in real time.
05.Comments
Leave a few wordsNo comments yet.