<!-- llms-full.txt · concat of EN home + 6 cases · ~116k chars · ~117 KB -->
# Ilya Kazantsev

`AI Product & Systems Engineer · full-stack Python + TS`

I build AI systems where autonomy has boundaries — model output becomes reviewable action, not silent state mutation.

**Location:** Vietnam · UTC+7  
**Hours:** EU / US / CIS hours friendly  
**Availability:** Open to work

| | |
|---|---|
| 5+ yrs end-to-end product engineering | |
| operational systems · AI agents · media | |
| 6 public cases · since Oct 2025 | |

---

## About

> Decisions documented, architecture first.

I'm most useful in ambiguous operational software — lead funnels, inventory workflows, media timelines, internal tools — where AI can draft, search, route or edit, while the backend owns permissions, audit trails, cost tracking and operator handoff.

Boundary first, tools second. The work is choosing what the system refuses to do automatically, then making the manual surface fast. Same loop across operational backends, AI agents, game engines and systems tools.

### Best fit

- AI-heavy internal tools and operational workflows
- Telegram/WebApp products with full-stack Python + TypeScript
- Human-reviewed AI actions over mutating state
- LLM tool APIs, callbacks and agent contracts
- Hash-verified diffs, audit logs and rollback surfaces
- Multimodal R&D — media timelines, workspace tools, code surfaces
- Prototype-to-production handoff with specs, runbooks and walkthroughs

### How I work

- Async-first: specs before implementation, PR-sized increments, written trade-offs
- Prefer small verifiable increments over heroic rewrites
- Keep AI actions reviewable: logs, diffs, human confirmation
- Handoff with specs, runbooks and codebase walkthroughs for teams that inherit the system

---

## Stack

Tools I've used in shipped systems, R&D and hands-on product work.

**AI & Agents.** MCP-style tool design · LLM orchestration · Agentic workflows · Audit trails · Hash-verified diffs · Multimodal I/O · RAG & embeddings · Prompt & context eng. · Eval & cost discipline

**Backend.** Python · FastAPI · Django / DRF · asyncio · Celery · aiogram · httpx · respx · SQLAlchemy · Pydantic

**Frontend.** React · TypeScript · Vite · Tailwind · shadcn · Radix UI · react-router · Zustand · Telegram WebApp · MapLibre GL · Remotion

**Infra.** PostgreSQL · SQLite · Redis · Docker / Compose · Nginx · Ubuntu · macOS · launchd · pf · sing-box · structlog · n8n

**Models · production.** Gemini · Claude · GPT

**Models · R&D.** DeepSeek · Qwen · Kimi · Llama · Grok

**Game / Media.** Luau / Roblox · Rojo · DataStore · Blender + MCP · Pillow · ffmpeg · Remotion rendering · Adobe ExtendScript

**Automation.** Playwright · Selenium · browser orchestration · proxy / VPN routing · anti-bot

**Gen media.** Flux · SDXL · Stable Diffusion · Midjourney · Nano Banana · Veo · Kling · Seedance · LTX-Video · Hunyuan · ElevenLabs · whisper.cpp · ComfyUI

**AI dev loop.** Claude Code · Codex · Cursor · MCP servers · MCP clients · multi-agent workflows

---

## Cases

### 01 — AI CRM · Real Estate

`01 · ai-crm · Delivered`

Production AI agent with MCP-style 19-endpoint tool API, audit trail, and operator handoff. Qualifies real-estate leads, runs project research, books viewings.

**Scope:** Solo · 7 weeks  
**Proof:** Code walkthrough on request  
**Surface:** Deployed to client server  
**Stack:** Django · Celery · aiogram · React · Gemini · n8n

Read full case study → https://ilyadev.xyz/cases/ai-crm.md

### 02 — Restaurant Stock AI Agent

`02 · ai-warehouse · MVP`

Restaurant stock AI agent with two-tier Gemini routing and draft→Confirm safety boundary. Receipts, prep and write-offs start in Telegram; the WebApp keeps review, WAC and dashboard impact visible.

**Scope:** Solo · 1 week  
**Proof:** Code walkthrough on request  
**Surface:** Telegram bot + WebApp  
**Stack:** FastAPI · SQLAlchemy · Postgres · React · Gemini 3

Read full case study → https://ilyadev.xyz/cases/ai-warehouse.md

### 03 — AI Video Editor

`03 · ai-video-editor · R&D`

Local AI video editor as an MCP-style substrate — file-level tool API with hash-verified diffs and audit trails. Timeline, scripts and async TTS as code, Gemini as orchestrator.

**Scope:** Solo · 3 weeks  
**Proof:** Code walkthrough on request  
**Surface:** Runs locally in Docker  
**Stack:** React · Zustand · FastAPI · Remotion · Gemini

Read full case study → https://ilyadev.xyz/cases/ai-video-editor.md

### 04 — Bullet Reign · Roblox

`04 · roblox-game · Published`

Bullet-heaven game for Roblox with a custom MegaMesh renderer (300 enemies at ≥55 FPS on mid-range mobile, 500 hard cap) and the full art pipeline running through Blender + MCP agents — solo, no artist.

**Scope:** Solo · 6 weeks  
**Proof:** Live on Roblox · performance walkthrough  
**Surface:** Published on Roblox  
**Stack:** Luau · Rojo · Blender + MCP · DataStore

Read full case study → https://ilyadev.xyz/cases/roblox-game.md

### 05 — macOS VPN · per-app routing

`05 · macos-vpn · Personal tool`

Per-app macOS routing tool with fail-shut pf gate, launchd boot persistence and live traffic monitoring.

**Scope:** Solo · ongoing  
**Proof:** Daily driver · code walkthrough  
**Surface:** macOS networking tool · self-hosted  
**Stack:** Python · sing-box · macOS pf · launchd · Clash API

Read full case study → https://ilyadev.xyz/cases/macos-vpn.md

### 06 — Portfolio Site

`06 · portfolio-site · Open source`

The site you're reading. React · Vite · TypeScript with token-first CSS modules, typed EN/RU/AR content, Markdown mirrors, privacy-first telemetry and route-aware navigation.

**Scope:** Solo · ongoing  
**Proof:** Live · public source · Markdown mirrors  
**Surface:** Public portfolio artifact  
**Stack:** React · Vite · TypeScript · CSS variables

Read full case study → https://ilyadev.xyz/cases/portfolio-site.md

---

## Timeline

### Oct 2025 — now — Independent AI Product & Systems Engineer

`Current`

Solo independent build cycle. Six selected cases — see Cases below. Off-site by NDA or scope: closed client engagements, internal tooling, ongoing prototypes, generative-media R&D, automation pipelines. Not absent — not standalone cases. Focus across all: LLM orchestration, MCP-style tool design, multimodal intake, audit trails, production handoff.

### 2024 — Sep 2025 — AI & Generative Media R&D · self-directed

`R&D bridge`

Hands-on map of the local and provider AI stack: LM Studio, Ollama, Automatic1111, ComfyUI, whisper.cpp, Stable Diffusion, Stable Video Diffusion, LTX-Video. In parallel: prototype architectures for AI wrappers, custom UIs around models, multiple Obsidian knowledge-system iterations, first agent prototypes, local inference and fine-tuning experiments, prompt/tool design, multimodal generation workflows. Foundation for the Oct 2025 product cycle.

### 2023 — Summer 2024 — Full-stack Developer · operational ERP for a multi-branch currency-exchange chain

`Sole engineer`

Six-month initial build for an operator-heavy currency-exchange chain: multi-level RBAC (owner → chain → branch → operator), automated rate engine, in-Telegram operator console (one bot, many parallel chats), full cashflow accounting and fee breakdown. Then 10+ months of production iteration through summer 2024 — rate-engine and messenger refactors driven by operator and end-user feedback.

### late 2022 — early 2023 — Developer · Telegram tooling + first chatGPT integrations

`Salaried`

Multi-track role for a single private employer. Built: OPDS-library Telegram WebApp (community-channel file parsing, metadata extraction, cover lookup, search and filters), small landing pages, lightweight e-shops with Telegram-admin panels, plus the first chatGPT integrations for the same employer — transcribe+summarize utilities, autoresponders, niche analysis probes.

### 2021 — 2022 — Automation & Backend Engineer · distributed pipelines + financial-scale Telegram tooling

`Systems`

Two parallel tracks. First: a 1,900+ VPS auto-provisioning rig (Ubuntu+GUI, multi-threaded task distribution, monitoring, browser automation, anti-bot, proxy/VPN orchestration, reliability under rate limits). Second: commercial Telegram bot tooling at financial scale — admin panels for transaction monitoring, multi-channel signal pipelines, custom messenger primitives that later seeded the 2023 exchange ERP.

### 2020 — 2021 — Python Backend · Telegram bots

`Freelance`

Freelance Telegram bots on aiogram: niche calculators, marketplace trackers (Ozon/WB), monitoring dashboards, multilingual subscription bots.

---

## Contact

> Available to hire, consult, or just say hi.

Replies within 24 hours.

- **Email** — ilya@ilyaDev.xyz
- **Telegram** — @ilyaDev_xyz
- **GitHub** — ilyaDev-xyz/portfolio
- **LinkedIn** — ilyaDev-xyz

---

Source: https://ilyadev.xyz/ (HTML)
Index: https://ilyadev.xyz/llms.txt — case-study list for agents


# AI CRM · Real Estate

`01 · ai-crm · Delivered`

Production AI agent with MCP-style 19-endpoint tool API, audit trail, and operator handoff. Qualifies real-estate leads, runs project research, books viewings.

**Scope:** Solo · 7 weeks  
**Role:** Full-stack platform with autonomous Telegram agent

**Video:** [YouTube](https://www.youtube.com/watch?v=Jddfb75n5WA) · [RuTube](https://rutube.ru/video/private/dbdadd9823e8fb606e0561d1077de66a/?p=4yGO6gARyq70kU5tbEhZCg)

## Video walkthrough

Production AI agent runs real-estate client chat in Telegram while operators supervise through a capability-scoped admin panel — Kanban pipeline, calendar, lead qualification, deep project research, viewing booking, operator handoff. Every AI call is tracked: tokens and cost are visible per client, separately for the client-talking agent and the operator helper.

An AI powered CRM for real-estate agencies — clients talk to an AI assistant inside Telegram and your team sees everything in one place.

The dashboard shows deals, leads and team workload. The Kanban board runs the pipeline — drag a card between stages, and the history takes care of itself.

Every card sits on the calendar too. Filter by operator or board, switch between month, week or day, and open any event to see the details, and linked cards.

From a card step straight into the chat. The AI qualifies the lead, finds matching projects, runs deep research if needed and sends an interactive property card right inside Telegram. The client browses photos, units and reports without leaving the app. When they are ready the AI books a viewing and saves a context note for the team.

If a person takes over the AI simply steps aside. Operators also have their own AI helper inside the system — ask it about a client, and it comes back with a quick summary and a ready reply, drawing on the full chat history.

Every AI call is tracked: tokens and cost are visible per client, separately for the one talking to the client and the one helping the operator.

The system also comes with light and dark themes, and a range of accent colors.

---

## Context

> Operators run calls and viewings. The AI runs everything else.

Real-estate agencies running their funnel through Telegram receive inbound around the clock. Volume outpaces what a small operator team can keep up with. Catalogs of thousands of projects and tens of thousands of units sit beyond what any human keeps in their head mid-conversation. Off-the-shelf chatbots reply once with FAQ and stall; nothing carries the buyer from first message to booked meeting.

The split is fixed: operators take calls and viewings; everything else — qualifying, deep project and district research, scheduling against the operator calendar — runs without them.

## Facts

| | |
|---|---|
| **Scope** | 7 weeks solo |
| **Surfaces** | CRM admin + Telegram WebApp (1 repo · 2 Vite entries) |
| **Catalog** | Thousands of projects · tens of thousands of units synced from GenieMap · raw_payload preserved |
| **Agent tools** | 19 in-house endpoints + 5 callbacks · Bearer auth · unified envelope |
| **Auth model** | 15 capability keys |
| **Status** | Delivered · 20 pytest modules · structlog · respx |

## Architecture

### Message lifecycle

```text
 1  Client                          Telegram message
        │
 2  tg_bot
        │  POST /api/v1/conversations/messages/        [BOT_SHARED_TOKEN]
        ▼
 3  Backend  (Django/DRF)
        │  upsert Conversation + Message[RECEIVED]
        │  Celery.dispatch_to_n8n_workflow.delay()
        ▼
 4  Celery worker
        │  POST {N8N_BASE_URL}/{webhook}               [N8N_API_KEY]
        ▼
 5  n8n  ──►  LLM
                │  tool_call  (projects-search, send-webapp, …)
                ▼
              Agent Tools API                          [Bearer]
                │  data  {data, meta, errors}
                ▼
              LLM  ──►  final reply text
        │
 6  Backend  ◄── callback /api/v1/integrations/n8n/    [N8N_CALLBACK_TOKEN]
        │  Message[AWAITING_CALLBACK → READY_TO_SEND]
        ▼
 7  Celery worker
        │  POST /bridge/send                           [TG_BRIDGE_TOKEN]
        ▼
 8  Telegram → Client            Message[SENT]
```

**Message states (7).** Linear path: RECEIVED → PROCESSING → AWAITING_CALLBACK → READY_TO_SEND → SENT. Branches: AWAITING_OPERATOR (AI toggled off mid-conversation), FAILED (any n8n error).

**Why a separate bridge.** The Telegram bot session is owned by one process — tg_bot. Backend never opens its own session; outbound goes through an HTTP bridge inside the same container. Backend stays stateless toward Telegram.

**Where AI lives.** Inside step 5 — n8n owns the LLM tool-loop. Backend serves both the callbacks and the tool calls themselves during the loop. (Why n8n at all — see §04-Decisions.)

### Component layout

```text
   External            tg_bot                   frontend  (admin · webapp)
   ────────            ──────                   ───────────────────────────
   Telegram   ◄─────►  aiogram + bridge         React · Vite (×2 entries)
                            │                          │
               BOT_SHARED ──┤                          │  HTTPS · /api/v1
               TG_BRIDGE  ──┤                          │
                            ▼                          ▼
                     ┌───────────────────────────────────────────┐
                     │   Backend  (Django · DRF · structlog)     │
                     │   Single envelope:  {data, meta, errors}  │
                     └────┬─────────┬──────────┬─────────────────┘
                          │         │          │
                          ▼         ▼          ▼
                    Postgres 16  Redis 7   Celery worker + beat
                                               │  N8N_API_KEY
                                               ▼
                                      n8n + n8n-worker
                                      (LLM tool-loops)
                                               │  N8N_CALLBACK_TOKEN
                                               ▼
                                         Backend.callback
```

**Nine Docker services.** postgres · redis · backend · celery · celery-beat · tg_bot · frontend · n8n · n8n-worker. (n8n-worker is a separate executor process; offloads long-running flows from the orchestrator.)

**Trust model.** Every cross-service call carries an explicit shared secret — bot bridge, n8n outbound, n8n callback, agent-tool Bearer. No service trusts another by container locality alone.

**Frontend split.** One Vite repo, two entry HTMLs — /admin (CRM) and /webapp/:projectId (Telegram WebApp). Components and API client are shared. The WebApp runs inside Telegram via the WebApp SDK; access control sits on the same backend endpoint rules as admin.

### Domain core

```text
CATALOG                          CONVERSATIONS
───────                          ─────────────
Project                          Conversation
  ├── Unit (×N)                    · status
  ├── ResearchBlock × 4            └── Message (×N, 7-state lifecycle)
  │     · core                   Note (manual + 3 system cards)
  │     · market & demand        OperatorAssistantSession
  │     · legal & ops
  │     · dynamics & news
  ├── Amenity (×N)
  └── District (FK)              KANBAN
                                 ──────
USERS                            Board
─────                              └── Column
User · capability strings               └── Card
     · telegram_link                         └── Activity
UserInvite
                                 CALENDAR
GENIEMAP                         ────────
────────                         Calendar
ProviderSyncLog                    ├── CalendarEvent
  · raw_payload preserved          └── ScheduleRule
  · scheduled syncs
```

**37 models in 7 apps.** Domain decomposed by bounded context: catalog · conversations · kanban · calendar · users · geniemap · auth. 5 apps carry no models (core, agent_tools, dashboard, webapp, echo). No god-table; every entity owns one job.

**Status enums everywhere, not bool flags.** Each domain carries its own typed status enum — Project (launch · available · sold_out · cancelled), Unit (available · reserved · sold · on_hold), Message (7-state), Kanban Card (open · closed_won · closed_lost). One projectable surface per domain instead of a parade of `archived: bool`, `is_active`, `is_draft` flags.

**raw_payload preserved.** Every entity synced from GenieMap stores the original JSON alongside normalized columns. Survives any mapping drift, makes re-ingest cheap.

### Agent topologies

![Four research-specialist agents (core, market & demand, legal & ops, dynamics & news), each on its own LLM, all sharing one web-search tool.](https://ilyadev.xyz/private/airea-n8n-specialists.png)

*Specialists cluster, shared tool*

![A single chat agent wired to twelve tools — projects/districts search and research, calendar availability and create, send-webapp, admin-notify, context-save, user-note CRUD, web-search.](https://ilyadev.xyz/private/airea-n8n-omnimodel.png)

*Omnimodel, twelve tools*

**Why this works.** The 19 endpoints are model-agnostic and orchestrator-agnostic. Tested with both GPT and Gemini against the same tool API.

## Key engineering decisions

### 01 · Capability overrides on top of role presets

**Decision.** Three role presets (OWNER · ADMIN · OPERATOR) define the default capability set; per-user extra_capabilities and revoked_capabilities give point overrides. Checks resolve through hasCapability() rather than role == "admin".

**Why.** Real operator combos don't fit a 3-role enum — 'dashboard yes, user-management no, AI-toggle yes' is routine. A pure role table would explode; pure capabilities lose the convenience of a sane default. The override model keeps both.

**Cost.** Two layers to reason about — preset + overrides — when debugging access. Default role presets had to be hard-coded so UX kept its onboarding shortcuts.

### 02 · 19 in-house agent tools on Bearer + unified envelope

**Decision.** Every AI tool call goes through /api/v1/agent/tools/... with its own Bearer token; every response shares the same {data, meta, errors} envelope.

**Why.** n8n runs without user context — session auth does not fit. A Bearer-edge gives the LLM a minimum-privilege surface; one envelope shape makes the tool loop stable.

**Cost.** Several endpoints duplicate per audience (operator vs AI) — two parallel surfaces instead of one with dual auth.

### 03 · n8n for the no-code surface

**Decision.** Agent logic lives in 4 n8n workflows; LLM tool-calls fan out to the in-house tool API, callbacks return through a webhook.

**Why.** Constraint: a no-code surface for agent-flow edits. n8n was the obvious candidate at the time: popular, mature, with native tool-calling and webhook primitives. The alternative — a custom admin UI over an in-process tool-loop — would have doubled MVP scope.

**Cost.** Prompt and orchestration logic split between n8n flows and the in-house tool API; n8n-worker runs as a separate service. Reflective take in §06-Lessons.

### 04 · Research as a 4-block state machine

**Decision.** Per-project research splits into 4 blocks (core, market & demand, legal & ops, dynamics & news), each cycling EMPTY → PENDING → READY | FAILED. n8n callbacks deliver each block back to the backend as it completes, and the project's research panel renders them block-by-block as they arrive.

**Why.** A deep research run takes several minutes — blocking the LLM tool-loop would freeze the Telegram chat. Per-block UX shows partial results instead.

**Cost.** Each block runs its own prompt and search sources — research-pipeline maintenance grows linearly with block count.

## Stack

| | |
|---|---|
| **Backend** | Python · Django · DRF · Celery · structlog · aiogram |
| **Frontend** | React · Vite · Tailwind 4 · Radix UI · MapLibre GL · react-router 7 |
| **Infra** | PostgreSQL · Redis · Docker Compose |
| **AI · orch.** | Gemini, OpenAI · n8n (model-agnostic) |
| **Agent surface** | 19 in-house tool endpoints + 5 callbacks · Bearer auth · {data, meta, errors} envelope |
| **Scale** | 12 Django apps · 224 frontend modules · 20 pytest modules · ~56K LOC |

## Lessons & status

### Carry forward

- Capability overrides on top of role presets — base role gives a sane default, per-user extra/revoked keys cover the long tail without inventing new roles.
- Single `{data, meta, errors}` envelope on every API response — a boring tool-loop is a feature, not a bug.
- Operator console UX — kanban + chat + calendar in one shell, with on-request handoff to a human inside the same thread.
- Audit-trail discipline — X-Request-ID propagated through structlog at every cross-service hop. Without it, debugging an LLM tool-loop falls apart.

### Would change

- Django was picked by reflex. The actual used surface — ORM, routing, app structure — stayed thin, while DRF added boilerplate around every endpoint and class-based permissions fought the capability-string model. FastAPI + Pydantic + async SQLAlchemy would have been a closer fit to the API-only shape this project ended up with.
- n8n didn't earn its keep — most flow edits landed in prompts or the tool API anyway. Today I'd run an in-process tool-loop and ship a thin custom UI only where edits actually happen.
- 9 Docker services were too many for MVP. Today: n8n + n8n-worker drop out (in-process tool-loop), celery-beat folds into celery as a periodic, frontend in production is static files behind backend rather than a dev container. Lands at 4 services: postgres, redis, backend, celery.

Built, deployed, handed off at engagement close; code walkthrough on request.

---

Source: https://ilyadev.xyz/cases/ai-crm (HTML) · /cases/ai-crm.md (this file)
Up next: 02 — Restaurant Stock AI Agent → https://ilyadev.xyz/cases/ai-warehouse.md
Index: https://ilyadev.xyz/llms.txt — full case-study list
Author: Ilya Kazantsev — https://ilyadev.xyz/index.md


# Restaurant Stock AI Agent

`02 · ai-warehouse · MVP`

Restaurant stock AI agent with two-tier Gemini routing and draft→Confirm safety boundary. Receipts, prep and write-offs start in Telegram; the WebApp keeps review, WAC and dashboard impact visible.

**Scope:** Solo · 1 week  
**Role:** AI-assisted restaurant stock operations

**Video:** [YouTube](https://www.youtube.com/watch?v=TwxmVN6JNvA) · [RuTube](https://rutube.ru/video/private/5031a5bb3f1f25bfe0961a75b1b9791a/?p=co_lT-Rej-FrcN6r7dpcyw)

## Video walkthrough

Restaurant stock operations split between Telegram (messy receipt photos, prep batches, write-offs as text or kitchen slips) and a WebApp that keeps every draft reviewable before stock moves. The agent reads receipts, expands dishes into ingredients, updates weighted average cost on confirm, and answers free-form warehouse questions with order recommendations.

AI stock operations for restaurants — receipts, prep work and write-offs start in Telegram, while the WebApp keeps every number reviewable.

The dashboard shows today, week and month: revenue, food cost, purchases, deductions and stock value, with compact charts for the working day.

Stock highlights what needs attention. Two frozen items are out, so the team restocks and sends the store receipt to the bot.

The AI reads the photo, matches food, drinks and operating supplies to stock categories, and creates a draft. Review it in the WebApp, then confirm — stock and weighted average cost update immediately.

Preparation batches are tracked too. When the kitchen makes pizza dough, tomato sauce, burger patties or marinated chicken, the team can tell the agent what was prepared, or add the batch manually in the WebApp. Ingredients go out, the prepared item comes back in, and costs stay connected.

For sales or write-offs, send the kitchen order slip, or just describe it in text. The AI recognizes the dishes, expands them into ingredients and prepares the deduction. Confirm it, and the dashboard moves again.

Then ask the agent anything about your warehouse — for example, how the month went. It analyzes historical supplies, deductions, revenue, food cost, stock movement and out-of-stock items, then returns recommendations for the next order.

Telegram handles messy input. The WebApp handles review. The dashboard shows the business impact.

---

## Context

> The model drafts. The human confirms. The dashboard shows impact.

Restaurant inventory drifts from records faster than anyone in the kitchen has time to log it. Receipts arrive on paper or by voice; supplies, dish deductions and pre-prepared bases each follow different math; weighted-average cost only stays correct if every operation hits the books on time. The cost of skipping a few entries is invisible until inventory swings — and by then the trail is gone.

The MVP closes the loop around that bottleneck: Telegram captures messy stock operations, the WebApp keeps every draft reviewable, and the dashboard turns confirmations into revenue, gross profit, food cost, stock value, low-stock risk and next-order recommendations.

## Facts

| | |
|---|---|
| **Scope** | 1 week solo |
| **Surfaces** | Telegram bot (Aiogram) + WebApp SPA (React 18) + browser fallback |
| **Domain** | 100 inventory · 40 menu · 12 prep recipes · 18 tables · deterministic 30-day history |
| **AI** | Gemini 3 Flash workflows · Pro-style operating analysis · per-call $ tracked |
| **Business** | Revenue · gross profit · food cost · stock value · low-stock risk · next-order list |
| **Status** | MVP · Telegram bot + WebApp · multimodal intake live · stock, WAC and dashboard flows end-to-end |

## Architecture

### Operator loop · dashboard → Telegram → review → impact

```text
 1  WebApp Dashboard
        │  Today / Week / Month
        │  stock value · revenue · gross profit · purchases
        │  deductions · preparations · movement charts
        ▼
 2  Stock view
        │  category groups · qty · stock value · unit cost
        │  Frozen flags: Green Peas Frozen + Potato Wedges Frozen = out
        ▼
 3  Telegram: Supply photo
        │  📷 Receipt received
        │  👁 Vision pass: reading receipt lines and totals
        │  🧠 Matching products to stock categories
        │  📦 Calculating quantities and weighted costs
        │  ✅ Draft supply ready for review
        ▼
 4  Draft supply result
        │  19 stock items matched · 15 650.24 RSD
        │  actions: Review · Confirm · Cancel
        ▼
 5  WebApp review + Confirm
        │  food · beverages · operating supplies
        │  confirm commits stock_levels + WAC + stock_history
        ▼
 6  Preparations
        │  pizza dough batch: ingredients out, prepared item in
        │  preparation cost cascades through WAC
        ▼
 7  Telegram: Deduction photo
        │  📷 Kitchen slip received
        │  👁 Vision pass: reading dishes and quantities
        │  🧠 Matching dishes to menu recipes
        │  🥘 Expanding dishes into ingredient deductions
        │  ✅ Draft deduction ready for review
        ▼
 8  Draft deduction result
        │  Chicken Shawarma ×3 · Shawarma Plate ×2
        │  confirm writes off ingredients at current WAC
        ▼
 9  Agent monthly analysis
        │  last 30 days · revenue · food cost · supplies
        │  preparations · top deductions · out-of-stock · next order
        ▼
10  Dashboard impact
        │  Telegram handles messy input
        │  WebApp handles review / confirm
        │  Dashboard shows the business effect
```

**Progress is part of the product.** Every AI action edits one Telegram message through visible stages: received, vision pass, matching, cost calculation or recipe expansion, then draft ready. The operator sees the model working instead of staring at a spinner.

**AI drafts, Confirm commits.** Supply, deduction and preparation workflows return structured drafts. Backend stores them with status = draft; stock, WAC, history and dashboard totals move only after the operator taps Confirm.

**Review spans Telegram and WebApp.** The Telegram result carries Review, Confirm and Cancel. Review opens the WebApp surface for line-level inspection; Confirm commits the same draft through the backend confirm endpoint.

### Multimodal intake → draft → confirm

```text
 1  User                          text  /  photo  /  voice
        │
 2  Bot  (Aiogram)
        │  base64 encode photo/voice  +  selected workflow
        │  POST /api/agent/message              [X-Telegram-User-Id]
        ▼
 3  Backend.router  (route_message)
        │  dispatch by workflow:
        │     supply | deduction | preparation  →  Flash workflow
        │     agent analysis                    →  monthly operating analysis
        ▼
 4  Flash workflow  (gemini-3-flash, ≤5 iterations)
        │  inject inventory snapshot into system_prompt
        │  call client.generate_content(tools=[...])
        ▼
 5  Gemini  ──►  text  /  function_call
                │
                ▼
              tool_create_supply_draft  (or _deduction_, _preparation_)
                │  parse args, normalize price_type
                ▼
              insert row in supplies / deductions / preparations
              with status = 'draft'
        │
 6  Backend  ──►  AgentMessageResponse(text, actions=[
        │           AgentAction(type=callback, label='Confirm',
        │                       data='confirm_supply:<uuid>'),
        │           AgentAction(type=callback, label='Delete',  ...),
        │         ])
        ▼
 7  Bot  ──►  inline keyboard rendered from actions
        │
 8  User taps Confirm
        │  POST /api/supplies/{id}/confirm
        ▼
 9  Backend
        │  recompute stock_levels  +  WAC
        │  insert stock_history row
        │  snapshot cost_per_unit on lines
        ▼
10  Done  ·  status = 'confirmed'  ·  audit_log written
```

**Two-phase by design.** The AI workflow creates a draft (no side effects). Confirm is a separate POST that updates stock_levels and writes stock_history. AI never touches live stock; the human button does.

**Backend owns action meaning.** The bot is a thin Telegram adapter: it uploads media, shows progress, renders backend-provided actions and forwards callbacks. Business meaning stays in the backend.

**Explicit lanes keep the MVP predictable.** The operator picks Supply, Deduction, Preparation or Agent before sending messy input. That removes false intent classification from the critical stock path; an auto-router can sit above the same dispatcher later.

### Component layout

```text
   Inputs              docker-compose  (4 services)              External
   ──────              ─────────────────────────────              ────────
   Telegram chat ────► bot  (Aiogram 3.4)                        Gemini API
                            │  POST /api/agent/message            google-genai
                            │  X-Telegram-User-Id                      ▲
                            ▼                                          │
   Telegram WebApp     ┌──────────────────────────────────┐            │
   ─initData──────────►│  backend  (FastAPI · async)      │────────────┘
   browser  ─JWT──────►│  9 routers · 18 tables · 9 mods  │
                       │  GeminiClient + token_usage      │
                       └────┬───────────┬─────────────────┘
                            │           │
                            ▼           ▼
                      postgres 16    receipts_data
                      (asyncpg WAL)  shared volume:
                                     bot writes · backend reads

                            ▲ HTTP /api/*
                            │
                       frontend  (Vite · React 18 SPA · :5173)
                       react-i18next 5 langs × 10 ns
                       @tanstack/react-query
```

**Bot is HTTP only.** Bot opens a long-poll loop to Telegram and POSTs every event into backend. No direct DB access from bot — keeps the bot stateless across restarts.

**Photos shared via FS, not API.** Bot writes uploaded photos into the receipts_data volume; backend reads from the same path. Avoids re-uploading multi-MB files between containers.

**Frontend is dual-mode.** Same Vite SPA mounts inside the Telegram WebApp (initData header) and as a plain browser app (deep-link JWT). One bundle, two auth paths in lib/auth.ts; the rest of the app cannot tell them apart.

### Domain core · 18 tables in 9 modules

```text
INVENTORY                            SUPPLIES
─────────                            ────────
inventory_items                      suppliers
  · status: active|archived          supplies  (status: draft|confirmed)
       │                               └── supply_lines
       ▼                                     · qty · price_per_unit
  stock_levels                              · expiry_date
    · qty · avg_cost  ◄────── WAC update on confirm
       │
       ▼
  stock_history  (audit log of every movement)


PREPARATIONS                         DEDUCTIONS
────────────                         ──────────
prep_recipes                         deductions  (status: draft|confirmed)
  · default_multiplier                 └── deduction_items
  · portion_size · portion_unit              · type: dish | item
  └── prep_recipe_ingredients               └── deduction_lines
                                                  · cost_per_unit
preparations                                       (snapshot at confirm)
  · multiplier  (e.g. 3× base broth)
  · cost = sum(ingredient × avg_cost) / output_qty   ← cascade WAC


MENU                                 AI · CHAT
────                                 ─────────
menu_items  (with archived flag)     chat_sessions
  └── menu_item_ingredients          chat_messages  (last 40 in context)
                                     user_settings
                                     token_usage
                                       · workflow_type · model
                                       · input/output/thinking_tokens
                                       · cost_usd


AUDIT
─────
audit_logs  ·  receipt_images  (shared FS path with bot)
```

**WAC lives in stock_levels.avg_cost.** Each supply confirm recomputes (old_qty*old_avg + new_qty*new_price) / total_qty into stock.avg_cost. Cascades through preparation: prep cost = sum(ingredient_qty * avg_cost) / output_qty, which itself enters output_stock.avg_cost via the same WAC formula.

**Cost snapshot on deduction confirm.** When a deduction confirms, deduction_lines.cost_per_unit = stock.avg_cost at that moment. Later supplies at different prices do not rewrite historical deductions.

**Dual-mode deductions.** deduction_items.type = dish (expand recipe; ingredients become lines) or item (direct stock line). One deduction can mix both — covers the case where the dish was served and the cook ate two extras off the side.

## Key engineering decisions

### 01 · Two-tier Gemini-3 with explicit workflow lanes

**Decision.** Two Gemini-3 tiers handle two distinct job shapes. Cheap Flash (gemini-3-flash-preview, thinking_budget=0, ≤5 iterations) runs stateless workflow tool-loops — one of {create_supply_draft, create_deduction_draft, create_preparation}. Pro-style analysis handles broader operating questions over the 30-day history, low-stock watchlist, dashboard metrics and next-order list. The lane is explicit in the MVP — Supply / Deduction / Preparation / Agent buttons in the bot. Per-call usage is parsed from response.usage_metadata and persisted in token_usage with workflow_type so Settings can break down spend by tier and workflow.

**Why.** A single-tier setup on Pro costs roughly 4× per token and pulls full chat-history overhead onto every receipt-photo dispatch. A single-tier setup on Flash fits narrow tool loops but not broad operating analysis. Splitting by job shape — narrow operation draft vs. broader business analysis — matches the real cost/benefit envelope of Gemini-3 today. Explicit lanes also reduce false intent classification while the core MVP proves the stock-operation path.

**Cost.** Two prompt families to maintain. Tool declarations partially duplicate across narrow workflows and broad analysis. Explicit lane choice is a UX cost on the bot side: one extra button before intake. An auto-intent router can remove that later, but it should sit on top of the same cost and workflow telemetry rather than replacing the tier split.

### 02 · Draft → Confirm on every mutating operation

**Decision.** Supplies, deductions and preparations are two-phase. The first POST inserts a row with status = draft — no side effects on stock. A second POST to /confirm pessimistically locks the relevant stock_levels rows, recomputes WAC, writes a row to stock_history per moved item, snapshots cost_per_unit on the deduction lines, and flips status to confirmed. AI never touches live stock; the human confirms via an inline button.

**Why.** An AI agent that mutates inventory directly is one bad transcription away from corrupting weeks of stock data. Two-phase makes every model proposal reversible (delete the draft) and inspectable (open the WebApp, edit the line, then confirm). The same pattern gives the human operator the same affordance — start a draft on the bot, finish it on the WebApp.

**Cost.** Every mutating surface is two endpoints (`/...` and `/.../confirm`). Stock has to accept draft rows that it ignores in totals. UX has to communicate that a freshly-created supply is not yet live. Concurrent confirms on the same stock_level need a lock — added a pessimistic `SELECT FOR UPDATE` on every confirm-path stock read (supplies + deductions + preparation cascade); read-only stock queries stay lock-free.

### 03 · Server-driven review actions, not bot-side business logic

**Decision.** Backend emits AgentAction inside AgentMessageResponse — each action carries a type (callback or webapp), a label, and a data string. Telegram renders Review, Confirm and Cancel from that list; the backend owns what confirm_supply:<uuid> means and which WebApp screen Review opens. The same contract works on Telegram, in the chat pane of the WebApp fallback, and in any text channel that supports tap-a-button.

**Why.** A bot that owns business actions becomes a parallel UI codebase: every new draft type, every review path, every workflow needs a bot-side handler change. Keeping the bot as a transport adapter means new stock flows ship in backend only. The operator still gets a rich Telegram surface — progress edits, result summary, Review/Confirm/Cancel — while business meaning remains centralized.

**Cost.** Backend has to know about Telegram-specific limits (callback_data ≤ 64 bytes; some types are inline-only). Buttons are dispatch-only — they do not carry inline forms. Anything richer than confirm/delete (editing line quantities, for example) has to fall through to the WebApp.

### 04 · Weighted Average Cost with cascade through preparations

**Decision.** Each supply confirm runs WAC over the affected stock_level: new_avg = (old_qty*old_avg + new_qty*new_price) / total_qty. Preparations apply WAC twice — first to compute the unit cost of the prep (sum(ingredient_qty * ingredient.avg_cost) / output_qty), then to merge that into the output stock_level via the same WAC formula. Deduction lines snapshot cost_per_unit = stock.avg_cost at confirm time; later price changes do not rewrite the past.

**Why.** FIFO/LIFO would require lot-level tracking — every supply line as a discrete batch with its own remaining qty — and a queue/stack pop on every deduction. For a single-restaurant kitchen with mixed ingredients (one bag of rice, not five distinguishable batches) WAC matches how the cook actually thinks about cost. Snapshots on confirm protect historical reports from later price drift.

**Cost.** WAC math hides batch-level variance — you cannot answer which exact supply this dish came from. Preparations turn one deduction into a chain of WAC computations; debugging a wrong cost means walking the chain by hand. With zero tests in the repo, every WAC change is verified manually.

## Stack

| | |
|---|---|
| **Backend** | Python · FastAPI · SQLAlchemy 2.0 Mapped (async) · asyncpg · Alembic · Pydantic v2 |
| **Frontend** | React 18 · Vite · TypeScript · Tailwind · @tanstack/react-query · react-i18next |
| **Bot** | Aiogram 3.4 (long polling) · httpx |
| **AI** | google-genai 1.61 · Gemini 3 Flash + Pro · multimodal (text/photo/voice) · cost matrix |
| **Infra** | PostgreSQL 16 · Docker Compose (4 services · 1 shared volume for receipts) |
| **Scale** | 9 backend modules · 18 tables · 13 pages · 5 langs × 10 i18n namespaces · 6 currencies · ~50 routes |

## Lessons & status

### Carry forward

- Per-call cost parsed from usage_metadata, persisted with workflow_type — Settings page slices spend by model, by workflow and by day. Wiring it before the first run is cheaper than bolting it on after the first vendor invoice; the same signal is what the smart-router should consume once it lands.
- Draft → Confirm with cost snapshot — every mutating AI output is reviewable before commit. The product safety boundary lives in the workflow shape, not in model accuracy: bad OCR can create a bad draft, but it cannot mutate live stock without human confirmation.
- Server-driven review actions — Telegram shows progress, result summaries and Review/Confirm/Cancel, while backend owns the action contract. Same contract works in Telegram, any text channel, and the WebApp chat pane. Adding a new workflow ships backend-side.
- Compact design system + hot-reload across all three containers — UI-STANDARDS.md (~600 lines) pins receipt-like cards, 28×28 square actions, 11–14 px type, Telegram CSS vars mapped to Tailwind tokens. Bind-mounts plus uvicorn --reload and vite HMR keep edit-to-visible cycles in seconds for backend, frontend, and bot.

### Would change

- Started with zero tests — acceptable for MVP speed, but workflow routing and WAC confirm paths now deserve pytest fixtures before more operators rely on them. The risky surface is small: draft creation, confirm mutation, WAC cascade and dashboard totals.
- HMAC validation for Telegram WebApp initData — hmac is imported in auth.py and never called. Trust today rests on Telegram WebApp client-side guarantees rather than server-side verification; closing it is a ~50-line addition.
- init_db() running alongside Alembic — Base.metadata.create_all() fires at startup and migrations exist beside it. Fine for solo iteration on a fresh database, becomes a foot-gun the first time someone clones the repo with neither path leading them to a clean schema. Either alembic stamp head after create_all, or drop create_all entirely.

MVP · Telegram bot + WebApp · multimodal intake live · supply, preparation, deduction, WAC and dashboard flows working end-to-end.

---

Source: https://ilyadev.xyz/cases/ai-warehouse (HTML) · /cases/ai-warehouse.md (this file)
Previous: 01 — AI CRM · Real Estate → https://ilyadev.xyz/cases/ai-crm.md
Up next: 03 — AI Video Editor → https://ilyadev.xyz/cases/ai-video-editor.md
Index: https://ilyadev.xyz/llms.txt — full case-study list
Author: Ilya Kazantsev — https://ilyadev.xyz/index.md


# AI Video Editor

`03 · ai-video-editor · R&D`

Local AI video editor as an MCP-style substrate — file-level tool API with hash-verified diffs and audit trails. Timeline, scripts and async TTS as code, Gemini as orchestrator.

**Scope:** Solo · 3 weeks  
**Role:** Local Remotion IDE

**Video:** [YouTube](https://www.youtube.com/watch?v=0FoXG0a_KBw) · [RuTube](https://rutube.ru/video/private/53d47519e665a3a008181653123bfa56/?p=HQuscnTnLSXaLZbs-RyJ8Q)

## Video walkthrough

Local AI video editor built as a substrate for agent-driven workflows — timeline, previews, scripts and async jobs all live in code, so an agent reads, edits and reverts them line by line through hash-verified changesets. Custom timeline with frame-to-hours zoom, four-pane CodeMirror script editor, async TTS pipeline; everything runs locally.

Local AI video editor — built as a substrate for agent-driven workflows. Timeline, previews, scripts and async jobs all live in code, so an agent can read, edit and revert them line by line.

The timeline is custom — drag, blade, magnet snap, and zoom levels from frame to hours.

The script editor opens any markdown or JSON file from the project tree up to four CodeMirror panes side by side.

Ask the agent to rewrite a paragraph and the editor updates live. Every edit is a hash-verified changeset — apply, revert, or open the diff before touching the disk. Ask it to cut and re-arrange the timeline and the moves play back step by step.

Audio generation is the first async pipeline wired up — the clip shows its progress as the job runs.

Everything runs locally. The substrate is in place — the rest stacks on top.

---

## Context

> Timeline in code — otherwise the agent stays a chat window.

Programmatically driving a video timeline from outside a commercial editor is a thin surface. Premiere Pro’s ExtendScript exposes import, export and basic timeline operations, but the editing surface is thinner than what an agent orchestrating generation needs — documentation thins out fast, and the parts a real workflow has to lean on sit outside the scriptable area. DaVinci’s scripting is clean and well-documented, but only in the paid edition; the rest of the market is closed or prototype. For an agent that orchestrates generation across a timeline, "use the existing editor’s plugin layer" stops being an option early.

Scripted video, increasingly, is code. Remotion compositions are TSX. Scenario lines and timings live in markdown and JSON. Where editing means moving lines and re-cutting timing, an AI coauthor can use the same tools a developer would — read a slice, replace a line range, diff, revert — if the editor itself treats the script as a file rather than a chat transcript. And the timeline itself has to live in code too, or the agent has nothing to write into.

## Facts

| | |
|---|---|
| **Scope** | 21 days solo |
| **Surfaces** | Browser editor + local FastAPI media-service · monorepo + docker-compose · runs locally |
| **Timeline** | Zoom 0.01×–50× · magnet snap · blade · undo/redo |
| **Composition** | Remotion 4.0 · Babel-standalone on-demand compile · per-clip error overlay |
| **AI agent** | Gemini 3 + 2.5 · audit-first changesets · sha256 hash-verified apply/revert · SSE |
| **Audio** | Async TTS pipeline · queue + state machine on the clip · two-speaker dialogue · scenario versioning |
| **Status** | Substrate built end-to-end · script edits and async TTS live · agent-orchestrator and export — next slices |

## Architecture

### AI edit lifecycle

```text
 1  User                          Type message + Enter
        │
 2  Frontend
        │  POST /ai/chat                       [threadId, model, system_instr]
        ▼
 3  media_service  (FastAPI)
        │  insert thread + message + run[pending]
        │  ThreadPoolExecutor.submit(run_chat_job)
        ▼
 4  run_chat_job  (worker thread)
        │  client.generate_stream(model, contents, tools, config)
        ▼
 5  Gemini  ──►  text  /  thought  /  functionCall
                │
                ▼
              tool_call  (file_read_slices, text_replace_lines, …)
                │
                ▼
              text_tools  ──►  audit  {before, after, meta}
        │
 6  agent_changes_v2
        │  append_file_change · sha256 base/after · unified diff
        ▼
 7  Frontend  ◄── SSE /ai/chat/{run_id}/stream      [streaming|tools|done]
        │
 8  apply_changeset(forward)
        │  verify sha256(file) == base_hash  →  write OR 409 hash_mismatch
        ▼
 9  File written  ·  changeset.status = applied
```

**Audit-payload always.** Every tool call returns an audit payload (before / after / meta) even when apply=false. Backend persists before_text, after_text, sha256 base/after, unified diff and byte size to SQLite before the file is touched. This gives a manual-review preview mode for free.

**Hash-verification.** Forward-apply compares current sha256(file) with base_hash; reverse-apply with after_hash. On mismatch — 409 with {path, expected, actual, direction}. force=true skips the check explicitly when the user accepts the clobber.

**SSE, not WebSocket.** Single SSE endpoint streams events {running, streaming, tools, retrying, complete, error}. sessionStorage holds the active run_id so F5 reattaches without losing the stream. Cancel is a separate POST.

### Component layout

```text
   Browser                       Local services            Storage
   ───────                       ──────────────            ───────
   Frontend  (Vite · React)      media_service             /projects/<id>/
        │                        (FastAPI :8000)              │
        Zustand · PlaybackStore        │                      ├ project.json
        Remotion Player                ▼                      ├ assets/<sha>.<ext>
        CodeMirror             ┌──────────────────┐           ├ previews/  proxies/
        │                      │  routes (~50)    │           ├ remotion/<aid>/
        │  HTTPS  /api/*       │  ThreadPool(5)   │           │     manifest.json
        ├─────────────────────►│  asyncio.Sem.    │           └ scripts/
        ◄──── SSE stream ──────┤  ffmpeg subproc  │             ├ *.md  /  *.json
                               │  google-genai    │             ├ workflows/*.json
                               └────┬─────────────┘             └ Scenario_TTS.json
                                    │
                                    ▼
                          SQLite (WAL · 11 tables)         External
                          ────────────────────────         ────────
                          threads · messages · runs        Gemini API
                          usage_records · tool_events      google-cloud-speech
                          changesets_v2 + file_changes_v2
                          tts_jobs · stt_jobs
```

**Two services, no broker.** Frontend + media_service in one docker-compose, plus a /projects volume. No Redis, no Celery — SQLite WAL + ThreadPool plays the queue role. See §04 / D4 for the fork.

**Project lives on the FS.** project.json is the single source of truth. Assets are content-addressable (sha256 as filename). On restart media_service rebuilds runtime state from FS + SQLite, no warm-up cache to invalidate.

**External — provider-agnostic at the call site.** Current substrate uses google-genai (chat + thinking + tools + TTS) and google-cloud-speech (STT) — swapping a model is adding a worker, not rewriting the substrate. Pricing per modality is computed locally from usage_metadata before persisting.

### Project + state model

```text
PROJECT (project.json)             RUNTIME STATE (in-memory)
──────────────────────             ─────────────────────────
Project                            Zustand store
  ├── Asset (×N)                     ├── project        (committed)
  │     · type:                      ├── editorState    (committed playhead)
  │       video|audio|image|         ├── history.past   (≤20 snapshots)
  │       remotion|text|callout      └── openTextContents
  │     · hash · originalPath
  │     · audio.{generationStage,    PlaybackStore  (mini-store, not Zustand)
  │              progress, error}      ├── status   playing|paused|buffering
  │                                    ├── timeSeconds  (live, every frame)
  ├── Folder (×N)                      ├── frame        (live)
  │                                    └── lastUpdateTs  (poller fallback)
  ├── Track (×M)
  │     ├── kind  Video | Audio
  │     └── Clip (×K)
  │           · type  Media | Text | Remotion
  │           · trackId · start · duration
  │           · linkedClipId   (V↔A pair)
  │           · transform      (overlay state)
  │
  └── EditorState
         · playhead    (committed, paused position)
         · zoom · tool · selectedClipIds


SQLITE  (usage.sqlite · WAL · 11 tables)
─────────────────────────────────────────
threads          messages          runs               usage_records
   └── usage           └── status       └── thinking_     └── prompt/output/
       totals              run_id           preset            cached/thought

agent_changesets_v2 ──┬── agent_file_changes_v2
                      │      · seq · path
                      │      · base_hash · after_hash
                      │      · before_text · after_text · patch_text
                      │      · status  pending|ready|applied|reverted
                      │
tool_events           tts_jobs           stt_jobs
   · run_id · seq        · status           · status
   · payload_json        · result_json      · result_json
```

**Two stores, two cadences.** Zustand holds committed state (paused playhead, undo stack, selectedClipIds). PlaybackStore holds live time. Live updates never reach Zustand — the parent tree around the Remotion Player does not re-render at frame rate.

**linkedClipId — one model for V↔A.** When a video with audio is dropped, the importer demuxes a separate audio asset and pairs the clips through mutual linkedClipId. Cut, move and delete cascade across the pair; a blade cut splits both halves cleanly.

**Audit-trail in SQLite.** Every AI tool call writes to changesets_v2 + file_changes_v2 before the file is touched. The status column (pending|ready|applied|reverted) is the edit timeline of the project — replayable forward and reverse.

## Key engineering decisions

### 01 · Audit-first changesets with hash-verification

**Decision.** Every AI tool call returns an audit payload (before / after / meta) even when apply=false. Backend persists before_text, after_text, sha256 base/after, unified diff and byte size in SQLite. apply_changeset compares the current sha256(file) against base_hash and rejects with 409 hash_mismatch when the file has drifted.

**Why.** An AI agent writing to files is a race condition by default. A user editing the same file in CodeMirror between generation and apply silently clobbers one side or the other. Hash verification makes the race visible — UI shows expected vs actual and a force-override button — and the persisted before/after/diff doubles as a manual-preview mode for prompts you do not yet trust.

**Cost.** Every tool call does extra work (sha256 + diff + persist) even when the patch is dropped. Schema grew by two tables. apply is a transaction over N files with rollback on the first mismatch — more code than a straight write, more edge cases (force, partial apply, reverse direction).

### 02 · Committed playhead in the store, live playback in a side ticker

**Decision.** editorState.playhead in Zustand holds only the paused/scrub position. Live playback time lives in a separate PlaybackStore (a mini external store, not Zustand), updated every frame via onFrameUpdate plus a 200 ms poller fallback. PlaybackController is the single integration point with Remotion PlayerRef.

**Why.** A naive design keeps a single playhead in Zustand and updates it every frame. That re-renders the parent tree around the Remotion Player 60 times a second; on long timelines under load the Player picks up periodic micro-resyncs and stutters. Splitting committed and live state is the only way to keep the timecode UI live without re-rendering the Player.

**Cost.** Two sources of truth for time. PlaybackController has to mediate every play / pause / seek, buffer pendingSeek and pendingPlay until the player attaches, and run a fallback poller for missed onFrameUpdate events. UI subscribers ride a custom subscribe pattern, not Zustand selectors — extra glue on top of an already custom store.

### 03 · Babel-standalone on-demand for Remotion clips

**Decision.** A user-authored Remotion clip is stored as code in a manifest.json next to the asset and compiled at runtime through @babel/standalone (presets: react + transform-modules-commonjs). The output is wrapped via new Function with React and a minimal Remotion API injected. Cache key is assetId + hash(code) + length. Compile and runtime errors are caught and rendered as a per-clip error overlay — they do not break the rest of the composition.

**Why.** The alternatives — pre-compile via Vite, or run a TS runtime in a Web Worker — both push compilation off the editing path. With Babel-standalone the user edits code in CodeMirror, hits save, and the next preview frame reflects it. Babel ships as a separate lazy chunk loaded only when the first Remotion clip mounts; for an R&D tool where author and user are the same person, new Function isolation is enough.

**Cost.** Babel-standalone is a chunky dependency, lazy-loaded with a visible "Loading Remotion compiler…" indicator on first use. The PRELUDE injects only a minimal Remotion subset (AbsoluteFill, Sequence, Audio, Video, Img, useCurrentFrame, useVideoConfig, interpolate, Easing, spring) — anything else has to be added explicitly. new Function is enough for an R&D tool; multi-tenant production would move compile into a Worker.

### 04 · System prompts as live project files

**Decision.** System prompts live in scripts/text-editor/workflows/*.json inside the project tree, edited through the same CodeMirror surface as everything else. The frontend re-reads the selected workflow file before each request and threads its system_instruction into generation_config; runs.generation_config_json and usage_records.meta_json.system_prompt persist {id, path, name} so every run remembers which prompt was on at the time.

**Why.** Prompts are AI-side artifacts, but the rest of the project already treats AI artifacts as files (audit-first changesets work the same way). Putting prompts on the same plane lets them version with the code, edit through DnD-import, change without restart, and show up in usage analytics next to model and token cost. Most production AI tools never close that loop.

**Cost.** Prompt file is re-read from disk on every request — fine while the file stays small. No schema validation on the workflow JSON; a malformed file fails at runtime. Prompt history lives in git plus per-run snapshots; there is no dedicated prompt-history UI.

### 05 · SQLite + ThreadPool as the queue, not Celery + Redis

**Decision.** AI runs, TTS jobs and STT jobs all live as rows in SQLite tables (runs, tts_jobs, stt_jobs) with an atomic claim_next_*_job() — a SELECT followed by an UPDATE setting status to running. Workers are a ThreadPoolExecutor(max_workers=5) for chat runs and asyncio.Semaphore(10/6) for TTS/STT inside the FastAPI event loop. SQLite runs in WAL mode. Pending jobs survive a media_service restart and get re-claimed on boot.

**Why.** The default ML-stack reflex is Celery + Redis + Flower. This product is single-machine — media_service runs alongside the frontend in docker-compose, peak load is ~5 concurrent AI runs and ~10 TTS jobs. A broker would add two services to compose, an extra failure mode, and operational weight for load that does not exist.

**Cost.** Does not scale horizontally — multiple worker machines cannot share one SQLite. Visibility is hand-rolled — no Flower dashboard, you read usage_records and tool_events directly. The frontend long-polls /tts/jobs/{id} until success/error instead of getting a push event. Schema migrations are absent (rebuilt at startup) — adding a column means deleting usage.sqlite.

## Stack

| | |
|---|---|
| **Frontend** | React 18 · Vite · TypeScript · Zustand+Immer · Remotion 4 · CodeMirror · Tailwind · shadcn/ui |
| **Backend** | Python · FastAPI · pydantic · sqlite3 (raw SQL · WAL) · ffmpeg subprocess |
| **AI** | Gemini 3 (pro/flash preview) + 2.5 · google-genai 1.55 · streaming · thinking · tools · TTS · STT (google-cloud-speech) |
| **Composition** | Remotion 4.0 · @babel/standalone (lazy) · per-clip error overlay |
| **Concurrency** | ThreadPoolExecutor(5) · asyncio.Semaphore(10/6) · ffmpeg sem (global=6, per-asset=3) · LRU caches |
| **Scale** | ~30K LOC TS — 3K timeline + 2.4K AI panel + 2K media + 2.3K properties (each a domain-rich UI surface) · ~8K LOC Py · 16 docs · ~50 routes · 11 SQLite tables |

## Lessons & status

### Carry forward

- Audit-first changesets — every broken AI prompt rolled back in one click without losing earlier work. The single thing I would carry into any next AI-IDE 1:1.
- Committed vs live playhead split — the pattern scales to any heavy-parent player. Without it, Remotion Player picks up periodic micro-resyncs under load that no amount of memoization fixes.
- Geometry in a shared utils module — normalizeTransform / resolveFittedBox live in src/utils/geometry.ts and feed both EditorComposition and the transform-overlay. Zero drift between preview and the on-canvas bbox.
- SQLite + ThreadPool as the queue — three weeks of daily use on a single-machine product, never once regretted not having Celery. The brokerless shape kept compose small and the failure modes legible.
- Bias toward open code surfaces — Remotion compositions are TSX, scenarios are markdown + JSON, workflow prompts are JSON files. The inverse of plugin-layer editors: the agent can author every layer because every layer was put on the keyboard from day one.

### Would change

- Vitest was dropped early — would not do that again. Babel-on-the-fly clip compilation and hash-verified changesets both need a unit harness; lint and build do not catch regressions in compiled-clip output or in apply/revert hash math. I would carry it back from day one.
- The schema is rebuilt at startup, no migrations. Fine for solo R&D (change the schema, delete usage.sqlite); a real handoff blocker, and a friction for picking the project back up a month later. Alembic is one day; saves every other day after.
- Frontend long-polls /tts/jobs/{id} to success or error; chat runs ride SSE. Two transports where one would do — re-using the chat SSE channel for any background job from day one would have made the second async pipeline (TTS) free, and the third (image / video) free again.

R&D · substrate built end-to-end · runs locally in Docker. Script edits and async TTS live; agent-as-orchestrator and export integration are the next slices on the same audit-tracked substrate.

---

Source: https://ilyadev.xyz/cases/ai-video-editor (HTML) · /cases/ai-video-editor.md (this file)
Previous: 02 — Restaurant Stock AI Agent → https://ilyadev.xyz/cases/ai-warehouse.md
Up next: 04 — Bullet Reign · Roblox → https://ilyadev.xyz/cases/roblox-game.md
Index: https://ilyadev.xyz/llms.txt — full case-study list
Author: Ilya Kazantsev — https://ilyadev.xyz/index.md


# Bullet Reign · Roblox

`04 · roblox-game · Published`

Bullet-heaven game for Roblox with a custom MegaMesh renderer (300 enemies at ≥55 FPS on mid-range mobile, 500 hard cap) and the full art pipeline running through Blender + MCP agents — solo, no artist.

**Scope:** Solo · 6 weeks  
**Role:** Performance engineering on Roblox

**Video:** [YouTube](https://www.youtube.com/watch?v=eflEYbm6dHM) · [RuTube](https://rutube.ru/video/d11492c853016477e8c77bacf2ad41c3/)

## Video walkthrough

Bullet-heaven on Roblox running 300 enemies at 55 FPS on mid-range mobile through a custom MegaMesh renderer with 15 draw calls regardless of population. Six points of interest, 26 weapons with 20 evolutions and 44 passives, two art styles, 17 enemy types, 90 icons — built solo across render, network, AI and content pipeline.

Bullet-heaven on Roblox — three hundred enemies on screen, fifty-five FPS on mid-range mobile.

Every enemy is a Lua table on the server and a bone in a shared MegaMesh on the client. Fifteen draw calls regardless of population on low quality.

Six points of interest — shrines, chests, obelisk trials, power crystals, fountains, magnets. Twenty-six weapons, twenty evolutions, forty-four passives.

Click any item in the catalog — full stats, every evolution and fusion path it can fold into.

Two art styles, Classic and Brainrot — seventeen enemy types, ninety icons.

Bullet Reign — live on Roblox.

One engineer · custom render, network, AI, content pipeline.

---

## Context

> The genre needs hundreds of enemies on screen. Roblox defaults give you tens.

Bullet-heaven as a genre demands 300+ active enemies on screen at sustained framerate. On PC the genre lives on Steam (Vampire Survivors, Megabonk). On Roblox the audience is enormous but mobile-first — and the engine's defaults (one Model + Humanoid + AnimationTrack per enemy, one RemoteEvent per state change) hold out for tens to maybe a hundred enemies before mid-range mobile FPS collapses into a slideshow. The default rendering path is the gate, not the simulation.

Solo means no artist for 15 enemy slots × 2 styles × 3 LODs (≈90 mesh-variations), no animator for the ~10 distinct skeleton types underneath, no engine specialist separate from the gameplay programmer. For the genre to work on this platform every layer — render, network, AI, content pipeline — has to be custom-built and operated by one person.

## Facts

| | |
|---|---|
| **Scope** | 6 weeks solo |
| **Genre** | Bullet-heaven · 30-min run · octagonal arena R=400 studs |
| **Render** | 1 draw call per enemy type · 500-enemy hard cap · ≥55 FPS @ 300 on mid-range mobile |
| **Network** | 7 bytes/enemy · 20 Hz tick · ~70 KB/s @ 500 |
| **Content** | 17 enemies · 6 bosses · 26 weapons · 44 passive items · 13 locales |
| **Status** | Published on Roblox |

## Architecture

### Render pipeline

```text
 1  Server tick (20 Hz · authoritative)
        │  500 enemies = 500 Lua tables (no Instances)
        │  EnemyManager.update():  pos · hp · ai-state in place
        │  ~120 B / table · zero engine alloc per mob
        ▼
 2  NetworkProtocol.packEnemyBatch()
        │  reusable _syncBatch upvalue (zero alloc)
        │  per enemy:  u16 id · i16 x×10 · i16 z×10 · u8 hp%   = 7 B
        ▼
 3  UnreliableRemoteEvent  ──►  client      ~3.5 KB @ 500
        │                                   ~70 KB/s @ 20 Hz
 4  NetworkProtocol.unpackEnemyBatch(buf, fn)
        │  callback-style reader  (zero alloc, hot path)
        ▼
 5  BoneRenderer v5
        │  one MeshPart per enemy type   →   ≤ 15 draw calls
        │  multi-size sub-pools via  Ex<size>_<slot>_BoneName
        │  lazy · auto-expand · schedule-preload T-10 s
        ▼
 6  bone.Transform = restCFInv * worldCF      (~200–435 active bones / frame)
```

**20K-tri budget = capacity planning.** `MAX_TRIS_BUDGET = 20000` in `export_megamesh.py` sets the per-pool tri ceiling; `count = budget // tris` derives how many copies fit in one MeshPart. A 400-tri mob → 50 copies per pool; a 1000-tri boss-preview → 20. The Python build-time parameter directly determines the runtime cap on simultaneous enemies.

**Bone-naming protocol.** `export_megamesh.py` writes per-copy bone names as `E{NNN}_<bone>` (E000_Torso, E001_Arm_R, …). `BoneRenderer.luau` parses the prefix with `^E%d+_(.+)$` to recover the canonical bone name for animation. One protocol, one pipeline — every art direction is served identically; runtime has zero conditional logic on style.

**Multi-size sub-pools.** One MeshPart hosts x1 / x1.5 / x2 sub-pools side by side; an XP-gem merge re-attaches to a larger slot in the same pool — no Instance churn, no extra draw call.

**Schedule-preload.** Surges in `SpawnDefinitions` carry a known T-10 s warning. The matching pool materializes ten seconds before spawn — the first wave does not stall on lazy-init, FPS stays flat through transitions.

### Wire format

```text
Wire format · UnreliableRemoteEvent · 20 Hz · server-authoritative

  HEADER     u16  enemyCount                              2 B
  ENTRY × N  u16  id              0 – 65 535              2
             i16  x × 10          ±3 276.8 studs          2
             i16  z × 10          ±3 276.8 studs          2
             u8   hp %            0 – 100                 1
                                                         = 7 B / enemy

  packet @ 500 enemies   = 2 + 500 × 7    = 3 502 B
  bandwidth @ 20 Hz      = 3 502 × 20    ≈ 70 KB / s

  hit batching:  26 weapons → 1 HitRequest / 100 ms
                 server: damage clamp [0, 2000] · 15 req/s/player
```

**Why ×10 fixed-point.** Positions fit ±3 276.8 studs at 0.1-stud precision. Arena radius is 400 — the dynamic range covers it 8×, and 0.1 stud is below visual snap on mobile.

**Why u8 hp%.** Server keeps real HP for damage math; client renders a bar. 1% precision is below the bar's visual threshold and saves 3 bytes per enemy compared to a u32 absolute value.

**×4–5 vs Lua-table sync.** The same payload as a `{id, x, y, z, hp}` table is ~16 KB / tick — mobile cellular drops the connection mid-run. Binary at 3.5 KB rides comfortably even on a degraded link.

### Server-tick topology

```text
SERVER (20 Hz · authoritative · enemies as Lua tables)

  GameManager (Lobby → Countdown → Run → End)
       │
       ├── SpawnManager    base 70% · surge 20% · event 10%
       │                   8 surge types · 7 separate spawners
       │                   soft-cap 350→500 · adaptive ±20%
       │
       ├── EnemyManager    5 modules · 17 types · 11 AI behaviors
       ├── BossController  6 bosses · Harvester m28 arena shrink
       ├── POIManager      shrine · chest · crystal · obelisk · …
       ├── DataManager     DataStore + schema migration · save 60 s
       └── HitValidator    damage clamp · 15 req/s / player
                                       │
                                       ▼   7 B / enemy @ 20 Hz
CLIENT (every frame)

  EnemyRenderer  →  BoneRenderer v5     →  bone.Transform   (15 draw calls)
  WeaponManager  →  SpatialGrid cell=20 →  26 weapons       (O(K) per query)
  ~25 UI modules · CardRenderer (LevelUp / Chest / POI rewards)
```

**Services.luau, not _G.** Every cross-module reference goes through one ModuleScript registry. R9 wave migrated 35 keys / 402 references across 65 files — zero `_G` left. Cross-module wiring through `_G` is the standard Roblox antipattern; the chore-refactor that ships no features is exactly the one most projects skip — and 100K-LOC Luau without it starts to rot.

**Single-slot vs multi-slot pool.** Bosses get a dedicated single-slot pool over a preview mesh (×2.5–3 visualScale) so `Highlight` for shield / invuln VFX attaches cleanly — multi-slot pools would attach the highlight to every enemy sharing the MeshPart.

**3-channel spawner.** Base rate scales linearly with minute (`2 + minute × 0.9`); the 6 bosses sit on fixed offsets (5 / 10 / 15 / 20 / 25 / 28 min), while 8 surge types and 7 separate spawners keep the rest of the schedule predictable. Predictable enough to preload pools, varied enough that runs differ.

### AI art pipeline output

![The icon library — weapon and perk silhouettes processed through `cut_icons.py` (batch-sheet slicing) and `whiten_icons.py` (RGB(0,0,0) → RGB(255,255,255) for Roblox `ImageColor3` tinting).](https://ilyadev.xyz/private/roblox-icons.png)

*Icon library*

![Roblox Studio scene with enemy meshes, collectible props, and three LOD variants (low / medium / high) for select assets — all baked through `export_megamesh.py` with the `E{NNN}_<bone>` bone-naming protocol.](https://ilyadev.xyz/private/roblox-bestiary.png)

*Bestiary + collectibles · 3 LODs*

**Volume in the frame.** 60 .blend sources · 220 .fbx exports · 16 Python scripts in the bake pipeline (cut_icons, whiten_icons, fix_normals_and_colors, gen_currency_icons, export_megamesh + per-weapon variants). The bestiary covers 15 enemy types in classic style with 3 LODs on select assets; the icon library covers 26 weapons + perks — sliced from batch-sheets and recolored to white so Roblox `ImageColor3` can tint them at runtime (black absorbs the tint colour, white passes it through).

**Two styles, one pipeline.** Classic and the alt-style set feed the same bake pipeline — same `export_megamesh.py`, same `cut_icons.py`, same `E{NNN}_<bone>` naming, same FBX preset. Each enemy slot brings its own skeleton (3–7 bones depending on the creature); per-slot AnimData split is the price — see Decision 5.

## Key engineering decisions

### 01 · Bone-Transform MegaMesh, not Roblox Models

**Decision.** One MeshPart per enemy type with N skinned bones; per frame, set `bone.Transform = restCFInv * worldCF` for each active enemy. 15 enemy types = 15 draw calls regardless of population (~200–435 active bones).

**Why.** The genre demands hundreds of enemies on screen at sustained framerate on mobile. Roblox's default `Model + Humanoid + AnimationTrack` per enemy peaks at dozens of instances before mobile FPS collapses — the renderer is the gate, not the simulation.

**Cost.** No `AnimationTrack`, no `Humanoid`, no `Touched`, no auto-replication — all rebuilt: `BoneRenderer` v5 (788 LOC), own keyframe sampler `BoneAnimator`, 26 hand-baked AnimData files across ~10 distinct skeleton types.

### 02 · Server enemies as Lua tables, not Instances

**Decision.** Server-side enemies live entirely as Lua dict entries — pos · hp · ai-state · effects. Zero `Instance.new()` in the enemy hot path; only the renderer's pool-Parts exist as Instances on the client.

**Why.** 500 `Instance.new()` per wave plus the same on death is fatal — engine GC pause + cross-process replication overhead per transient mob. Tables sit in memory at zero engine cost and let the 20 Hz tick stay tight.

**Cost.** Every Roblox feature that takes an Instance is gone for non-boss enemies — `Touched`, `ProximityPrompt`, `Highlight`, `Tag`. Hit detection is rebuilt client-side over the SpatialGrid; bosses run a dedicated single-slot pool to bring `Highlight` back for shield / invuln VFX.

### 03 · Buffer-packed binary on UnreliableRemoteEvent, not table RemoteEvent

**Decision.** 7-byte fixed-form per enemy (header + entries) over `UnreliableRemoteEvent`. Sender packs through a reusable `_syncBatch` upvalue array; receiver reads via callback (`unpackEnemyBatch(buf, fn)`) — zero allocations in either direction.

**Why.** At 20 Hz × 500 enemies, table-form sync is ~16 KB / tick → mobile cellular drops the connection. Binary is ~3.5 KB. UnreliableRemoteEvent is the right primitive for state-snapshots — losing a tick and getting the next one is fine; retransmission would actively hurt.

**Cost.** No schema, no Studio inspector, no auto-versioning. Every wire-format change requires coordinating sender and reader by hand. Debugging a single enemy state takes bone-level instrumentation rather than a property watch.

### 04 · AI-driven art pipeline through Blender + MCP agents

**Decision.** Concept → img-to-3D → Blender (via MCP agent) → bake as MegaMesh (`export_megamesh.py`, 20K-tri budget) → 3 LODs → FBX → upload. 16 Python scripts cover icon cuts, recolors, vertex-color repair, procedural skybox / pyramid / damage textures.

**Why.** 15 enemy slots × 2 styles × 3 LODs ≈ 90 mesh-variations plus a 26-weapon icon library in 6 weeks of solo work — hand-authoring on this volume does not fit the schedule. Tooling has to absorb the volume.

**Cost.** 16 scripts to maintain. AI-generated topology often needs `fix_normals_and_colors.py` post-pass; each new enemy carries a small calibration tax.

### 05 · Art style lives in source, not in code

**Decision.** Two `.blend` source sets (classic + alt-style) feed the same bake pipeline (`export_megamesh.py`, `E{NNN}_<bone>` naming, common FBX preset) — runtime reads from `EnemyPools` or `BrainrotPools` via one `Services.BrainrotMode` toggle. Each enemy slot brings its own skeleton; the bake pipeline is the only thing strictly shared.

**Why.** 15 enemy slots × 2 styles = 30 mesh-sets that must behave identically at runtime — animation, hit-detection, VFX. Branching by style at runtime would mean two of every code path; branching at bake-time means zero code paths know about style. Adding a third style is a fresh source set, zero Luau changes, zero new tooling.

**Cost.** Alt-style meshes occasionally diverge from classic bone names — 11 of 15 enemies need their own AnimData files; 4 (skeleton_spearman, priest_of_anubis, tomb_assassin, obelisk) reuse the classic ones because bone names match. The boundary sits per-slot at bone-name parity — paid at content-add time, not in code.

## Stack

| | |
|---|---|
| **Language** | Luau (typed) · Rojo 7.6.1 · Rokit toolchain |
| **Render** | Custom Bone-Transform MegaMesh · BoneRenderer v5 · BoneAnimator (own keyframe format) |
| **Network** | UnreliableRemoteEvent · 7-byte packed protocol · 20 Hz tick · custom SpatialGrid (cell 20 studs) |
| **Persist** | Roblox DataStore + schema migration · auto-save 60 s · onLeave · onShutdown |
| **Art pipe** | Blender + MCP agents · 16 Python scripts · 60 .blend · 220 .fbx · weapon + perk icon library |
| **Scale** | ~100 K Luau LOC across ~370 modules — ~100 locale (13 × 8 namespaces), ~90 definition (26 weapons + 17 enemies + 44 passives), 5-module EnemyManager core (≈3 K LOC of hot-path) · 2 Roblox places · ~58 RemoteEvents · 10 R-wave refactors |

## Lessons & status

### Carry forward

- Bone-Transform MegaMesh paid back 100× over — day-one pattern for any future Roblox project at scale.
- Server-as-data (enemies are Lua tables) kept memory and replication budgets predictable through every optimization phase.
- Buffer-packed binary with a reusable `_syncBatch` upvalue — zero-allocation discipline survived from prototype to publish unchanged.
- AI-driven art pipeline — by week 4, prompt → 3D → bake → upload was muscle memory; without it ≈90 mesh-variations plus a full icon library in 6 weeks is not solo-feasible.
- Bootstrap handshake (`ClientReady`) — Bootstrap.client.luau eager-requires player ModuleScripts, waits for CharacterAdded, then signals the server, which blocks `startCountdown()` until every alive player has reported in. Closes a class of listener-registration races that lazy ModuleScripts make easy to ship by accident.

### Would change

- Modular decomposition (R1–R10) should have started earlier. By stages 24+ EnemyManager was creeping toward a 2K-line monolith; one of the R-waves could have unloaded it sooner instead of carrying the weight all the way to the rendering overhaul on stages 24–27.
- AI-generated meshes need a stricter pre-bake gate. `fix_normals_and_colors.py` was reactive — today I would run it inside `export_megamesh.py` unconditionally and fail the export on bad vertex colors.
- AnimData split was discovered per-enemy as classic↔alt-style bone-name mismatches surfaced. A single bone-name parity spec at the first alt-style enemy would have set the contract once — instead, 11 of 15 ended up with their own AnimData files reactively.

---

Source: https://ilyadev.xyz/cases/roblox-game (HTML) · /cases/roblox-game.md (this file)
Previous: 03 — AI Video Editor → https://ilyadev.xyz/cases/ai-video-editor.md
Up next: 05 — macOS VPN · per-app routing → https://ilyadev.xyz/cases/macos-vpn.md
Index: https://ilyadev.xyz/llms.txt — full case-study list
Author: Ilya Kazantsev — https://ilyadev.xyz/index.md


# macOS VPN · per-app routing

`05 · macos-vpn · Personal tool`

Per-app macOS routing tool with fail-shut pf gate, launchd boot persistence and live traffic monitoring.

**Scope:** Solo · ongoing  
**Role:** macOS systems engineering · CLI tooling

---

## Context

> The kill-switch isn't a watchdog. It's the absence of a route.

Developer tooling that depends on a stable per-app egress IP — the protected app must always exit through the same proxy node, while every other process on the machine stays direct. Off-the-shelf VPN clients tunnel everything; CIDR or DNS-based split-tunneling cannot help when the same destination needs the proxy for one process and direct for another. Per-process granularity in the right combination is not in any popular consumer GUI.

For the protected lane, fail-open after a tunnel drop is louder than the drop itself — a single second of direct egress flags the receiving side. A watchdog kill-switch leaves a race window between sing-box exit and the watchdog reacting; fail-open in that window is the failure mode the protected lane cannot tolerate.

## Facts

| | |
|---|---|
| **Scope** | Solo · daily driver |
| **Surfaces** | macOS CLI · self-installing pf anchor · launchd boot persistence |
| **Routing** | sing-box TUN + process_name dispatch · 3 protected app groups |
| **Protocols** | VLESS+Reality (TCP stealth) · Shadowsocks (TCP+UDP) — chosen per session |
| **Monitoring** | Clash API @ 127.0.0.1:9090 · per-app speed · 3-tier sparklines (4 min · 1 hour · session) |
| **Status** | Personal tool · daily driver |

## Architecture

### Per-app routing through TUN

```text
   Protected processes                    every other process
   (3 app groups · process_name match)        │
        │                                     │
        ▼                                     ▼
   sing-box  TUN-mode  (utun99 · 172.19.0.1/30)
        │  intercepts ALL system traffic
        │
        │  route.rules:
        │    if process_name in VPN_PROCESSES  →  outbound: proxy
        │    else                              →  outbound: direct
        ▼
   ┌─── proxy outbound ──── VPN provider ──── internet
   │
   └─── direct outbound ──── en0 ──── internet
```

**Dispatch by process, not destination.** Routing rules in sing-box (route.rules) match on process_name, not destination IP or domain. Two "network spaces" share one host: protected apps always exit through proxy, everything else exits direct — even when both reach the same destination.

**TUN intercepts everything.** sing-box runs in TUN-mode (auto_route: true) — every packet from every process passes through utun99. Without this, dispatch by process_name is impossible: kernel route tables work on destinations, not process owners.

**DNS split inside sing-box.** Protected processes get DoH via proxy-dns (1.1.1.1 over HTTPS through the proxy); everything else hits direct-dns (1.1.1.1 plain UDP). A DNS-hijack route rule intercepts system DNS queries before they leave the box. Without per-process DNS, VPN-host resolution would leak via the local resolver.

### Negative-space pf gate · state matrix

```text
  state             en0 traffic owner    fires                outcome
  ─────             ─────────────────    ─────                ───────
  VPN running       sing-box (root)      pass quick user 0    OK · proxy + direct work
  VPN crashed       user-app (uid≠0)     block drop default   no internet for user
  boot, no mvpn     user-app (uid≠0)     block drop default   no internet for user
  mvpn  (root)      mvpn (root)          pass quick user 0    subscription / pings work

  rules in /etc/pf.conf via anchor "singbox-killswitch":

      pass quick on lo0 all
      pass quick on utun99 all
      pass quick on { en0 en4 en5 en6 } from any to any user 0
      pass quick on { en0 en4 en5 en6 } proto { tcp udp } to any port 53     # DNS
      pass quick on { en0 en4 en5 en6 } proto udp from any port 68 to 67     # DHCP
      pass quick on { en0 en4 en5 en6 } proto udp to 224.0.0.251 port 5353   # mDNS
      block drop on { en0 en4 en5 en6 } proto { tcp udp } all                # default-deny
```

**Kill-switch is an absence, not an action.** No watchdog process. No "monitor sing-box, then call pfctl block" loop. The block rule is loaded once; it fires whenever a non-root socket tries to write to a physical interface. When sing-box dies, user-apps fail-shut by default — there is nothing to react.

**Discovery-level explicit allows.** lo0 + TUN + DNS + DHCP + mDNS pass-through is required for first connect. Without DNS the very first sudo mvpn could not resolve the VPN host; without DHCP a fresh boot could not lease an IP. The block applies only to non-root TCP/UDP on Ethernet — discovery primitives stay open.

**Multi-interface match.** Rules apply to en0 + en4 + en5 + en6 — Wi-Fi plus three USB-Ethernet adapter slots. Plugging in a tethered phone or a USB-C hub does not bypass the gate.

### Protocol picker · kill-switch active

![Protocol selection screen — VLESS+Reality (TCP-only stealth) vs Outline / Shadowsocks (TCP+UDP). Default: Outline.](https://ilyadev.xyz/private/macos-vpn-start.webp)

*Every start: stealth or full transport*

![Banner shown when sing-box stops — "INTERNET BLOCKED — kill switch active" with recovery commands listed below.](https://ilyadev.xyz/private/macos-vpn-killswitch.webp)

*Fail-shut state surfaced explicitly*

**Default: Shadowsocks.** Outline (Shadowsocks) handles full TCP+UDP — gaming, voice, video calls all work. VLESS+Reality is the stealth pick when full transport is not required (see §03 Decisions for the trade-off). Choice is per session, never automatic.

**Recovery is three commands.** sudo mvpn re-connects (re-fetches subscription, re-selects server). sudo mvpn kill-apps force-closes the protected processes (SIGTERM → SIGKILL) before opening the gate. sudo mvpn disable removes the pf anchor — internet returns direct, but if a protected app is still running its traffic now exits direct.

**Disable confirms before opening.** sudo mvpn disable checks for live protected processes before tearing down the anchor. If any are running it asks "Kill? [y/N]" — refusing means the user must close them manually before the gate opens. Without the prompt, disable would silently put protected apps onto the direct path.

### Boot lifecycle

```text
   macOS boot
        │
        ▼
   launchd  (RunAtLoad: true)
        │  com.mvpn.killswitch.plist  →  /Library/LaunchDaemons/
        │  ProgramArguments:
        │    pfctl -a singbox-killswitch -f killswitch.pf.conf
        ▼
   pf anchor "singbox-killswitch" loaded
        │  rules active · no internet for user-apps
        ▼
   user runs  sudo mvpn
        │  fetch subscription  (root → pass quick)
        │  parallel TCP ping → pick best server
        │  generate config.json  →  sing-box check  →  start
        ▼
   sing-box up  (PIDFILE written, TUN online)
        │  process_name dispatch live
        │  Clash API on :9090 ready
        ▼
   live-status loop  (urllib → /connections → render)
```

**enable vs mvpn — orthogonal commands.** sudo mvpn enable runs once per machine: copies com.mvpn.killswitch.plist into /Library/LaunchDaemons/, registers the launchd daemon, installs the pf anchor in /etc/pf.conf. sudo mvpn runs every session: subscription → ping → start. After enable, every reboot blocks internet until sudo mvpn is run.

**Self-installing pf anchor.** pf.py:_ensure_anchor_in_main() reads /etc/pf.conf, appends anchor "singbox-killswitch" if absent, runs pfctl -f /etc/pf.conf to reload. Idempotent — re-running enable is a no-op if the line is already there. Removed by disable via _remove_anchor_from_main().

**Boot-time pf log.** launchd writes pfctl stdout/stderr to pf-launch.log — first thing to check if the anchor failed to load on boot (corrupted config, syntax error after a manual edit).

## Key engineering decisions

### 01 · Negative-space pf gate as the kill-switch

**Decision.** pf rules: pass quick on en0/en4-6 user 0 (root traffic, including sing-box) + block drop on en0/en4-6 default for everything else, with explicit pass-throughs for lo0, utun99, DNS, DHCP, and mDNS. Kill-switch behaviour is the side-effect of the default-deny rule — there is no watchdog, no liveness check, no reactive block step.

**Why.** A reactive watchdog kill-switch ("monitor sing-box; if dead, call pfctl block") leaves a race window — between sing-box exit and the watchdog reacting, user-apps drop straight onto en0. For a protected lane where any leak is louder than the drop itself, the gate has to work as a property of the system, not as an action triggered by an event. Default-deny + explicit allows compresses safety into the routing table itself: nothing has to "work right" for the block to apply — the absence of a route is the block.

**Cost.** pf rules require root and live in /etc/pf.conf via a self-install path (_ensure_anchor_in_main()); debugging "why no internet" goes through pfctl -s rules, not through application logs. No per-app fail-open — the gate is binary across the protected set. Discovery primitives (DNS, DHCP, mDNS) need explicit pass-throughs or first-connect breaks; the pf rule list is no longer trivially short.

### 02 · process_name dispatch in TUN mode (not CIDR / DNS split-tunnel)

**Decision.** sing-box runs in TUN mode (auto_route: true) — every packet from every process flows through utun99. Two route rules dispatch: process_name in VPN_PROCESSES → outbound: proxy; otherwise final: direct. Both rules see the same destination set; the only differentiator is the process owner.

**Why.** The use case is per-app static IP, not "traffic to host X via VPN". CIDR or DNS-based split-tunneling breaks when the same domain needs the proxy from process A and direct from process B — both rules match, only one wins, the other process either leaks or fails. process_name dispatch is the only mechanism that keeps two distinct egress paths for two distinct local processes hitting the same destination.

**Cost.** TUN intercepts everything — if sing-box hangs, system networking goes with it. Dispatch is per-packet (overhead is small but measurable on heavy traffic). VPN_PROCESSES is a manual list in config.py — adding an app means running ps -eo comm | grep <name> and editing the dict. No GUI, no auto-discovery yet.

### 03 · Two protocols at session start (VLESS+Reality vs Shadowsocks)

**Decision.** Every sudo mvpn prompts: VLESS+Reality (TCP-only, looks like HTTPS to DPI) or Shadowsocks (TCP+UDP, simpler obfuscation). Default is Shadowsocks. Choice persists for the session; switching protocols means stop + restart.

**Why.** Stealth and UDP are mutually exclusive in this stack. VLESS+Reality with flow=xtls-rprx-vision is TCP-only — Reality cannot proxy UDP at all. The route rule in VLESS mode falls UDP from protected processes back to outbound: direct (visible at singbox.py:69-78). For a use case that needs UDP through the proxy (gaming, voice), that is a real-IP leak; for a TCP-only use case, VLESS-mode stealth is worth the UDP-direct fallback. Shadowsocks proxies both. One protocol cannot serve both shapes; auto-switching would hide a security-relevant decision behind heuristics.

**Cost.** Two outbound branches in singbox.py — _make_vless_outbound and _make_shadowsocks_outbound. UX cost: an extra prompt on every session start. The user must know what they picked — mvpn status surfaces the active protocol but the picker itself is the only checkpoint where the choice happens.

### 04 · launchd-mounted pf anchor with self-install

**Decision.** sudo mvpn enable (once per machine) installs com.mvpn.killswitch.plist into /Library/LaunchDaemons/ and the pf anchor into /etc/pf.conf. After enable, every reboot reloads the pf rules before any user-space app comes up. sudo mvpn is a separate per-session command — fetch subscription, pick server, start sing-box.

**Why.** Post-reboot is the danger window — auto-launch apps can reach networking before the user types sudo mvpn. A run-on-demand kill-switch (load rules at start, drop them at stop) leaves protected apps fail-open until the user remembers to start the VPN. Mounting the anchor at boot inverts the default — internet is blocked by default; turning it on is the conscious step. Splitting enable from mvpn keeps the per-session command short and the per-machine setup explicit.

**Cost.** enable writes to /etc/pf.conf and /Library/LaunchDaemons/ — both require sudo and document the install as a known surprise ("no internet after reboot until sudo mvpn"). The plist hardcodes an absolute path to killswitch.pf.conf — not portable to other accounts without templating.

### 05 · Per-process DNS split inside sing-box

**Decision.** sing-box config defines two DNS resolvers — proxy-dns (DoH, 1.1.1.1 over HTTPS, routed through the proxy) and direct-dns (1.1.1.1 over plain UDP, direct). A DNS rule routes queries from VPN_PROCESSES through proxy-dns; everything else falls to direct-dns as the final. A separate route rule (protocol: dns, action: hijack-dns) intercepts every DNS request the system tries to make and redirects it to sing-box internal resolution.

**Why.** A single resolver — even fast and reliable — leaks the resolution path. If protected processes resolve VPN-host names through the local system DNS (router, ISP, public 1.1.1.1 plain UDP), the receiving side sees the lookup come from the host's real IP before any proxy connection happens. DoH-through-the-proxy keeps both the resolution and the connection inside the same egress path — the proxy is the only network surface that sees protected-process activity.

**Cost.** sing-box DNS hijacking can fight with apps that pin their own DoH endpoints (browsers with ECH enabled, for example) — the hijack action redirects the query but the in-app DoH client may not respect the redirect. DoH through the proxy adds round-trip latency on every cold lookup compared to plain UDP. Two resolvers in the config double the surface that has to stay healthy.

## Stack

| | |
|---|---|
| **Runtime** | Python 3.12+ · macOS pf · launchd · sing-box (TUN mode) |
| **Protocols** | VLESS+Reality (TCP stealth) · Shadowsocks (TCP+UDP, Outline format) |
| **Dispatch** | sing-box route.rules · process_name per-app match · DNS hijack |
| **Monitoring** | Clash API @ 127.0.0.1:9090 · /connections poll · custom rendering |
| **Persistence** | launchd plist · self-installing pf anchor in /etc/pf.conf |
| **Scale** | ~2k LOC Python · 9 modules · 32-line pf ruleset · 22-line launchd plist |

## Lessons & status

### Carry forward

- Negative-space gate as a design pattern — security expressed as the absence of a route, not the presence of a watchdog. Default-deny + explicit allows beats reactive watch-and-block on every metric that matters: race window, complexity, surfaces that have to "work right".
- Three independent ring-buffers for sparklines (4 min · 1 hour · session, at 1pt per 3s / 45s / 3600s) — each tier answers a distinct question. One buffer with downsampling either loses granularity at short ranges or compression at long ones; three buffers, each tuned for its own time scale, get the best read at every zoom.
- enable separated from session-start command — boot persistence is an orthogonal property, not a side-effect of the connect command. One install, every session stays trivial.

### Would change

- Fetch-once-at-start was the wrong assumption about subscription stability — the server URL is itself part of what the provider rotates, not a constant for the session. Health-check on stalled downloads with re-fetch is the shape I would build first now: subscription-watch, not subscription-fetch.
- APP_GROUPS is a manual dict in config.py. Adding an app means running ps -eo comm | grep <name> and hand-editing the file. Cheap upgrade: an interactive mvpn add-app that lists running processes, lets the user mark which to route through VPN, and rewrites the dict.
- launchd plist hardcodes an absolute path to killswitch.pf.conf. Works for one user. The right shape is enable substituting $HOME into a plist template before copying — costs a few lines, makes the tool portable to any account.

Personal tool · daily driver. Scope is intentionally limited; auto-recovery for subscription rotation and multi-host failover remains a possible next slice. Code walkthrough on request.

---

Source: https://ilyadev.xyz/cases/macos-vpn (HTML) · /cases/macos-vpn.md (this file)
Previous: 04 — Bullet Reign · Roblox → https://ilyadev.xyz/cases/roblox-game.md
Up next: 06 — Portfolio Site → https://ilyadev.xyz/cases/portfolio-site.md
Index: https://ilyadev.xyz/llms.txt — full case-study list
Author: Ilya Kazantsev — https://ilyadev.xyz/index.md


# Portfolio Site

`06 · portfolio-site · Open source`

The site you're reading. React · Vite · TypeScript with token-first CSS modules, typed EN/RU/AR content, Markdown mirrors, privacy-first telemetry and route-aware navigation.

**Scope:** Solo · ongoing  
**Role:** Engineering portfolio · 2025–2026

---

## Context

> Six cases the recruiter cannot click into. One site they can.

Of the six showcased projects, five sit behind closed/NDA scope — client work, internal tooling, R&D. Recruiters cannot open them, cannot pull the repo, cannot diff the commit history. The site itself is the only public artifact a senior reviewer can audit on the spot: open DevTools, watch the network, read the CSS, view-source the dot-grid. Every detail visible to the user is also a decision visible to the engineer.

The constraint is dual. The page must scan for a recruiter in five seconds — fixed-shape cards, mono spec-sheets, ASCII diagrams that read like terminal output. It must also hold under a senior eye who scrolls slow — token-first design, hand-rolled i18n with compile-time parity, view transitions for the home → case morph, and a Hero ember field that does not cook iPhones. The site has to be its own case study, because nothing else here can be opened.

## Facts

| | |
|---|---|
| **Scope** | Solo · ongoing |
| **Surfaces** | Home + 6 cases · EN/RU/AR · dark/light · 3 palettes |
| **Source** | Public GitHub repository · CV PDF · copy-as-Markdown actions |
| **Stack** | React 18 · Vite · TypeScript · CSS variables |
| **Content** | Typed EN/RU/AR dictionaries · public/private content trees · generated Markdown mirrors |
| **Telemetry** | Cookieless same-origin events · admin-only analytics |
| **Motion** | View transitions (card → case hero FLIP-morph) · prefers-reduced-motion respected |
| **Status** | Open source · live |

## Architecture

### Provider tree + routing

```text
<ThemeProvider>           // src/theme/ThemeContext.tsx
  <LangProvider>          // src/i18n/LangContext.tsx
    <BrowserRouter>
      <Loader />
      <ScrollToHash />
      <AnalyticsRouteTracker />
      <Routes>
        <Route "/"            → <HomePage />
        <Route "/cases/:slug" → <CaseRoute /> → <ProjectDetailPage />
        <Route "*"            → <HomePage />
      </Routes>
    </BrowserRouter>
  </LangProvider>
</ThemeProvider>
```

**Provider order matters.** ThemeProvider outermost so the palette is set on `<html data-palette>` before any child reads color variables. LangProvider next so `useT()` is available everywhere. Loader, scroll restoration and analytics all sit inside the router because they need routed content or `useLocation()`.

**Two functional routes + 404 fallback.** `/` renders the home page; `/cases/:slug` renders the case-study page; `*` falls back to home for unknown URLs. Slug whitelist (`CASE_SLUGS = [ai-crm, roblox-game, ai-video-editor, ai-warehouse, macos-vpn, portfolio-site]`) lives in `src/config/cases.ts` — `Nav.tsx` and `ProjectDetailPage.tsx` consume it; server analytics keeps its own allowlists.

**No code-splitting yet.** Both routes ship in one bundle. Lighthouse stays green; revisit when a third route lands.

### Token resolution chain

```text
 1  ThemeContext        theme = 'dark'  →  palette = 'ochre'
        │
        ▼  writes data-attrs on <html>
 2  <html data-theme="dark" data-palette="ochre">
        │
        ▼  tokens.css matches via attribute selectors
 3  [data-theme="dark"]                       → --bg --fg --line
    [data-palette="ochre"][data-theme="dark"] → --accent --accent-soft
        │
        ▼  CSS modules imported by styles.css consume via var()
 4  .hero h1 { color: var(--fg) }
    .chip    { background: color-mix(in oklab, var(--bg) 10%, …) }
```

**Theme determines palette.** Derived deterministically — `dark → ochre` (warm gold accent), `light → electric` (cool blue accent). Both `data-theme` and `data-palette` written to `<html>`. CSS handles the swap via attribute selectors; zero JS branching per element.

**Token file + CSS modules, no Tailwind.** `tokens.css` (153 LOC) is the single source of truth for colors / spacing / type / radii. `styles.css` is now a tiny entry file importing `base`, `home`, `media`, `case-study` and `runtime` modules. The token system does what Tailwind config would; component selectors do what utility classes would.

**oklch + color-mix for backdrops.** Heavy use of `color-mix(in oklab, var(--bg) X%, transparent)` for tinted overlays — the frosted Hero chips, the case-study glows, the diagram backdrops. Avoids opacity tricks on the wrong color.

### Loader + interface entrance phases

```text
T=0          T=400 ms                       T=900 ms
│            │                              │
▼            ▼                              ▼
pulse    →   rush                       →   gone
hairline     two whiteish blurred           loader unmounted
opacity      pulses run from 15% / 85%      hero/about reveal
0.22-0.7     toward center (600 ms)         (staggered)

at T+400 ms of rush:
  html.is-loading removed (was set synchronously by inline <script> in index.html)
    → hero-fx + about-body cascade through enter-fade / enter-up keyframes
      (stagger T=0..1550 ms across hero / row-top / about / contact)
    → loader opacity fades 1→0 between T=400-900 ms
    → pulses arrive at center while loader is already mid-fade

prefers-reduced-motion: rush skipped, embers off, staggered entrances off
```

**Synchronous is-loading class.** `index.html` contains an inline `<script>` that sets `html.is-loading` *before* the React script loads. Hero/about elements have `opacity: 0` while that class is on. Zero FOUC — no `useEffect`-frame flash of unstyled content.

**Hairline aligned to real divider.** Loader hairline measures `#home.getBoundingClientRect().bottom` and sets `--loader-line-y` so its on-screen position matches the real Hero↔About divider. When the loader fades, the white pulsing hairline visually transitions into the existing accent-tinted divider in the same Y. Continuity gimmick — one frame of optical magic.

**Light comes on during rush, no flash.** Earlier version had a flash phase (radial accent burst at center). Replaced — `is-loading` removes at T+400 ms, hero entrance starts, loader fades 1→0 between T=400-900 ms. Pulses meet the lit interface, no harsh flash. `prefers-reduced-motion` skips the rush entirely.

## Key engineering decisions

### 01 · Tokens, not Tailwind — token file + split CSS modules

**Decision.** `tokens.css` defines colors / spacing / type / radii via CSS custom properties. `styles.css` is only the cascade entry point; actual selectors live in `base.css`, `home.css`, `media.css`, `case-study.css` and `runtime.css`. No utility framework, no CSS-in-JS, no UI library.

**Why.** A site with a strict design language gets diluted by Tailwind's utility class noise — the type scale, palette, and spacing have to be re-derived in config either way. The token system does what Tailwind would; component selectors do what utility classes would, but the code stays readable and grep-friendly. Splitting after the design stabilized keeps the same cascade while making bugs easier to bisect.

**Cost.** Adding a new heading size still means adding a token, never an inline `clamp()`. The split creates more files and import-order discipline; Vite bundles the imports back into one production stylesheet, so runtime cost stays unchanged.

### 02 · Two parallel content trees, not a key-based store

**Decision.** `useT()` returns a complete `Content` object per language — no flat `t('hero.title')` calls; consumers read `useT().hero.name`. Under the hood, two trees live side by side: `src/content/public/` (committed sanitized demo) and `src/content/.private/` (gitignored real). Explicit public/private scripts set `CONTENT_SOURCE`, run `select-content.mjs`, and write a generated `active.ts` barrel pointing at the selected tree. The `Content` type enforces EN/RU/AR parity at compile time — TypeScript rejects any tree if a key is missing.

**Why.** For three languages × ~250 strings, the naïve `{ "hero.title": { en, ru, ar } }` shape loses readability — paragraphs split across keys, no scan of full copy in one place. The custom `LangContext` (~70 LOC) + `useT()` gives autocomplete (`t.hero.name`), TS parity guarantee across all three trees, and direct paragraph-in-context editing. react-i18next ships ~25 KB minified for plurals / namespaces / interpolation the site does not use.

**Cost.** Adding a field forces all three languages at once or a transient TS error. Plurals or interpolation would need to be hand-rolled. For a translation-team workflow with a TMS, the trade flips back toward a key-based store.

### 03 · Markdown twin per page, not HTML scraping

**Decision.** Every public page has a generated Markdown sibling: `/index.md`, `/cases/<slug>.md`, plus RU `.ru.md` and AR `.ar.md` variants of each, plus `/llms.txt`, `/llms-ru.txt`, `/llms-ar.txt` and `/llms-full.txt`. The same serializers power the in-page copy-as-Markdown button.

**Why.** Five portfolio projects are private, and AI/search agents do a cleaner job citing a small Markdown twin than scraping rendered React HTML. The React UI stays optimized for humans; the Markdown layer gives agents and reviewers a stable text artifact generated from the same typed content, in every language.

**Cost.** Generated files are build artifacts, not source of truth. The Vite middleware has to serve `.md` / `.txt` directly with explicit UTF-8, and any content-shape change must keep the serializers in sync.

### 04 · Hero embers — CSS particles, not SVG filter + SMIL

**Decision.** 26 absolutely-positioned `<span>`s, each animated via CSS `@keyframes` on `transform: translate3d(...)` + `opacity` only. Glow via `box-shadow: 0 0 4px var(--accent)` (per-element GPU-cached). Per-particle CSS custom properties (`--x`, `--start-y`, `--drift`, `--size`, `--dur`, `--delay`) drive variety from a deterministic seed pattern.

**Why.** First version used an SVG `<filter>` with `feTurbulence + feDisplacementMap` on a group of 26 `<circle>`s + SMIL. iOS Safari was unusable — measurable phone heat within ~2 minutes on iPhone 16 Pro Max; cheap Android did not reproduce. Root cause: WebKit rasterizes SVG filters on CPU, the filter cache invalidates every frame on animated children, SMIL runs through a slow non-GPU path. Combined: 100% CPU pegged for what reads as a static ambient.

**Cost.** Lost the sub-pixel warble — visually subtle, not worth the cost. The static `<radialGradient>` stays in SVG (viewBox-stable, doesn't animate, effectively free). The general rule — SVG filters only on static elements; CSS transform + opacity for anything that moves — now lives across every ambient animation in the codebase.

### 05 · ASCII diagrams in <pre>, not Mermaid

**Decision.** Architecture diagrams live as plain template literals in `en.ts` / `ru.ts`, rendered into `<pre>`. Box-drawing chars (`─│┌┐└┘├┤┬┴┼╔╗╚╝═║▼▲►◄→←↑↓`) — JetBrains Mono renders them cleanly across themes. Image diagrams (airea n8n + roblox bestiary) are first-class through `images?: { src, alt, caption? }[]` alongside `ascii?` in the diagram type.

**Why.** ~200 KB of JS for three diagrams on a portfolio targeting Lighthouse 95+ is not justified. ASCII in mono on a `--bg-sunk` block reads as a deliberate engineer-style artifact — same energy as the `.hatch` placeholders. Source is right there in `en.ts` as a template literal — no round-trip through external tooling, no compile step, no runtime parse.

**Cost.** Diagrams must be hand-drafted; mistakes are visible (one off-by-one hairline shifted a column for two days). Cannot interactively zoom or collapse. Image variant ratio (`fr = W/H` for column widths to align heights) only stays correct with no inner padding on the `<img>` — caught one alignment bug after the fact. A guard tool / lint rule would have surfaced it earlier.

### 06 · Privacy-first telemetry, not public analytics theater

**Decision.** Client events go to a same-origin `/api/track` endpoint via `sendBeacon` with a fetch keepalive fallback. The event set is deliberately small: pageview, dwell, outbound, interaction and video. Visitor identity is daily-salted and the dashboard is admin-only.

**Why.** Public counters are a reverse signal on a low-traffic portfolio. The useful proof is the system design: cookieless events, route-aware dwell tracking, retention boundaries and an inspectable same-origin pipeline. Recruiters do not need volatile traffic numbers; senior reviewers can see the implementation path.

**Cost.** The system adds a tiny server surface and operational chores: SQLite backups, log rotation, salt rotation and admin auth. It should stay private unless the numbers become a meaningful signal.

### 07 · Sticky video provider + single-player rule + LiteYouTube facade

**Decision.** Each video card carries a `Mirror on RuTube ↔ Mirror on YouTube` toggle. The choice is global, sticky in `localStorage` — one click anywhere swaps every mounted player site-wide. A separate single-slot store holds the active player; opening a second card pauses the first via `src` recompute. Both ride a `LiteYouTube` facade — placeholder image until first click, no embed cost on initial paint.

**Why.** Per-card provider state would mean clicking RuTube on five cards in a row — the exact opposite intent. Session-sticky resets on reload. localStorage-sticky + global respects "I prefer this provider" once. Single-player rule prevents two simultaneous audio streams when the user clicks play on a new card without pausing the old one — including the autoplay-burst when the provider toggle fires and every mounted player re-renders.

**Cost.** Cross-tab sync via `storage` event is not implemented (other tabs pick up on reload). YouTube preserves position via `start=lastTime` param (driven by `recordVideoTime` postMessage); RuTube reloads from start (postMessage API less reliable across versions). Brief reload flicker on the paused card. Provider-swap autoplay-burst handled but adds gating logic to `LiteYouTube`.

### 08 · Public/private content trees, swapped via a generated barrel

**Decision.** Two parallel content trees: `src/content/public/` (committed sanitized demo) and `src/content/.private/` (gitignored real). Both export the same `{ en, ru }` shape against the same `Content` type. `scripts/select-content.mjs` requires explicit `CONTENT_SOURCE=public|private`; `build:public`, `build:private`, `dev:public`, and `dev:private` write `src/content/active.ts` before running Vite/TypeScript. Plain `build` and `dev` fail with guidance. A GitHub Actions leakage job blocks any commit of `.private/`, real media filenames, or Cyrillic outside `src/content/`.

**Why.** Recruiters get a complete inspectable site; private case copy stays out of the public repo. Without a swap mechanism the choice was vendor-coverage in a feature flag (runtime branching, dead code in the bundle) or two repos (drift between engine and content). The barrel is generated, not committed — so the two trees never confuse each other and the production bundle only contains the chosen one.

**Cost.** A fresh clone has no `active.ts` until a public/private script runs — IDE may show a transient TS error until then. The `.private/` tree must mirror the `public/` shape manually; a structural change in `Content` forces both updates. CI leakage rules need maintenance when new media patterns appear.

## Stack

| | |
|---|---|
| **Frontend** | React 18 · Vite · TypeScript · react-router-dom |
| **Styles** | CSS custom properties · tokens.css + 5 domain CSS modules · oklch palette · no Tailwind / UI library |
| **Content** | Custom i18n context · 3 typed dictionaries · generated Markdown mirrors · no react-i18next |
| **Motion/video** | View transitions API · CSS @keyframes · LiteYouTube facade · sticky provider toggle · single-player rule |
| **Telemetry** | Same-origin `/api/track` · route-aware dwell · daily-salted visitor hash · admin-only dashboard |
| **Scale** | ~8.5K LOC TS/TSX incl. content · ~2.5K LOC CSS · ~100 TS modules · 7 CSS files · 6 cases · 3 languages |

## Lessons & status

### Carry forward

- Token-first discipline — adding a new heading size means a new token, never an inline `clamp()`. Got bitten once when Stack's h2 went off-scale; the fix was introducing `--fs-display-l` shared across every section heading. Tokens are a compile-time guarantee against visual drift.
- Two parallel content trees (`public/` + `.private/`) swapped via a generated `active.ts` barrel — paragraphs read naturally in context, RU and AR mirror EN. TS-enforced parity across all three trees catches missing keys before they ship. Editing one paragraph doesn't fight a flat key registry.
- Synchronous `is-loading` class via inline `<script>` in `index.html` — zero FOUC on first paint. The React Loader removes the class during the reveal phase. No `useEffect`-frame flash of unstyled content.
- Default to CSS transform + opacity for any ambient animation — the iOS Safari ember rewrite is the war story. SVG `<filter>` on animated content is CPU-bound on WebKit; same look from CSS particles + box-shadow runs cool. General rule: SVG filters only on static elements unless there's explicit budget.
- Split `styles.css` into per-domain modules (`base / home / media / case-study / runtime`) once the page stabilized — the single-file phase was right for rapid iteration; per-domain split is right post-stabilization. Cross-section cascade bugs now bisect to a 200–650 line file; Vite bundles the @imports back at build time, so the production bundle is unchanged.
- Markdown mirrors as a first-class agent layer — the same typed content now serves humans in React and agents through `.md` / `llms.txt` without duplicating copy by hand.

### Would change

- `react-router-dom` for two routes is honest overkill. Bundle cost ~12 KB gzipped — trivial — but a thirty-line custom router would have removed a dependency. `ScrollToHash` already needs `useLocation`, dragging the router in anyway; live with it until a third route lands.
- Image-diagram variant (`images?: []` alongside `ascii?` in `CaseStudyDiagram`) was added after the ASCII convention was already cemented. The `imageCols` ratio (`fr = W/H`) only stays correct with no inner padding on the `<img>` — caught one alignment bug after the fact. A guard tool or lint rule would have surfaced it earlier.
- Frontend slug registry now lives in `src/config/cases.ts` — `Nav.tsx` and `ProjectDetailPage.tsx` consume it. The server side still keeps its own copies (`KNOWN_PATHS` in `ingest.js`, `SLUGS` in `admin.js`); next cleanup is a shared JSON read by both, so adding or renaming a case is one edit, not a checklist.

Open source · live · ongoing. Public GitHub source, generated Markdown mirrors, three typed dictionaries (EN/RU/AR), privacy-first telemetry and two themes × three palettes. The site is its own case study.

---

Source: https://ilyadev.xyz/cases/portfolio-site (HTML) · /cases/portfolio-site.md (this file)
Previous: 05 — macOS VPN · per-app routing → https://ilyadev.xyz/cases/macos-vpn.md
Index: https://ilyadev.xyz/llms.txt — full case-study list
Author: Ilya Kazantsev — https://ilyadev.xyz/index.md
