# Restaurant Stock AI Agent

`02 · ai-warehouse · MVP`

Restaurant stock AI agent with two-tier Gemini routing and draft→Confirm safety boundary. Receipts, prep and write-offs start in Telegram; the WebApp keeps review, WAC and dashboard impact visible.

**Scope:** Solo · 1 week  
**Role:** AI-assisted restaurant stock operations

**Video:** [YouTube](https://www.youtube.com/watch?v=TwxmVN6JNvA) · [RuTube](https://rutube.ru/video/private/5031a5bb3f1f25bfe0961a75b1b9791a/?p=co_lT-Rej-FrcN6r7dpcyw)

## Video walkthrough

Restaurant stock operations split between Telegram (messy receipt photos, prep batches, write-offs as text or kitchen slips) and a WebApp that keeps every draft reviewable before stock moves. The agent reads receipts, expands dishes into ingredients, updates weighted average cost on confirm, and answers free-form warehouse questions with order recommendations.

AI stock operations for restaurants — receipts, prep work and write-offs start in Telegram, while the WebApp keeps every number reviewable.

The dashboard shows today, week and month: revenue, food cost, purchases, deductions and stock value, with compact charts for the working day.

Stock highlights what needs attention. Two frozen items are out, so the team restocks and sends the store receipt to the bot.

The AI reads the photo, matches food, drinks and operating supplies to stock categories, and creates a draft. Review it in the WebApp, then confirm — stock and weighted average cost update immediately.

Preparation batches are tracked too. When the kitchen makes pizza dough, tomato sauce, burger patties or marinated chicken, the team can tell the agent what was prepared, or add the batch manually in the WebApp. Ingredients go out, the prepared item comes back in, and costs stay connected.

For sales or write-offs, send the kitchen order slip, or just describe it in text. The AI recognizes the dishes, expands them into ingredients and prepares the deduction. Confirm it, and the dashboard moves again.

Then ask the agent anything about your warehouse — for example, how the month went. It analyzes historical supplies, deductions, revenue, food cost, stock movement and out-of-stock items, then returns recommendations for the next order.

Telegram handles messy input. The WebApp handles review. The dashboard shows the business impact.

---

## Context

> The model drafts. The human confirms. The dashboard shows impact.

Restaurant inventory drifts from records faster than anyone in the kitchen has time to log it. Receipts arrive on paper or by voice; supplies, dish deductions and pre-prepared bases each follow different math; weighted-average cost only stays correct if every operation hits the books on time. The cost of skipping a few entries is invisible until inventory swings — and by then the trail is gone.

The MVP closes the loop around that bottleneck: Telegram captures messy stock operations, the WebApp keeps every draft reviewable, and the dashboard turns confirmations into revenue, gross profit, food cost, stock value, low-stock risk and next-order recommendations.

## Facts

| | |
|---|---|
| **Scope** | 1 week solo |
| **Surfaces** | Telegram bot (Aiogram) + WebApp SPA (React 18) + browser fallback |
| **Domain** | 100 inventory · 40 menu · 12 prep recipes · 18 tables · deterministic 30-day history |
| **AI** | Gemini 3 Flash workflows · Pro-style operating analysis · per-call $ tracked |
| **Business** | Revenue · gross profit · food cost · stock value · low-stock risk · next-order list |
| **Status** | MVP · Telegram bot + WebApp · multimodal intake live · stock, WAC and dashboard flows end-to-end |

## Architecture

### Operator loop · dashboard → Telegram → review → impact

```text
 1  WebApp Dashboard
        │  Today / Week / Month
        │  stock value · revenue · gross profit · purchases
        │  deductions · preparations · movement charts
        ▼
 2  Stock view
        │  category groups · qty · stock value · unit cost
        │  Frozen flags: Green Peas Frozen + Potato Wedges Frozen = out
        ▼
 3  Telegram: Supply photo
        │  📷 Receipt received
        │  👁 Vision pass: reading receipt lines and totals
        │  🧠 Matching products to stock categories
        │  📦 Calculating quantities and weighted costs
        │  ✅ Draft supply ready for review
        ▼
 4  Draft supply result
        │  19 stock items matched · 15 650.24 RSD
        │  actions: Review · Confirm · Cancel
        ▼
 5  WebApp review + Confirm
        │  food · beverages · operating supplies
        │  confirm commits stock_levels + WAC + stock_history
        ▼
 6  Preparations
        │  pizza dough batch: ingredients out, prepared item in
        │  preparation cost cascades through WAC
        ▼
 7  Telegram: Deduction photo
        │  📷 Kitchen slip received
        │  👁 Vision pass: reading dishes and quantities
        │  🧠 Matching dishes to menu recipes
        │  🥘 Expanding dishes into ingredient deductions
        │  ✅ Draft deduction ready for review
        ▼
 8  Draft deduction result
        │  Chicken Shawarma ×3 · Shawarma Plate ×2
        │  confirm writes off ingredients at current WAC
        ▼
 9  Agent monthly analysis
        │  last 30 days · revenue · food cost · supplies
        │  preparations · top deductions · out-of-stock · next order
        ▼
10  Dashboard impact
        │  Telegram handles messy input
        │  WebApp handles review / confirm
        │  Dashboard shows the business effect
```

**Progress is part of the product.** Every AI action edits one Telegram message through visible stages: received, vision pass, matching, cost calculation or recipe expansion, then draft ready. The operator sees the model working instead of staring at a spinner.

**AI drafts, Confirm commits.** Supply, deduction and preparation workflows return structured drafts. Backend stores them with status = draft; stock, WAC, history and dashboard totals move only after the operator taps Confirm.

**Review spans Telegram and WebApp.** The Telegram result carries Review, Confirm and Cancel. Review opens the WebApp surface for line-level inspection; Confirm commits the same draft through the backend confirm endpoint.

### Multimodal intake → draft → confirm

```text
 1  User                          text  /  photo  /  voice
        │
 2  Bot  (Aiogram)
        │  base64 encode photo/voice  +  selected workflow
        │  POST /api/agent/message              [X-Telegram-User-Id]
        ▼
 3  Backend.router  (route_message)
        │  dispatch by workflow:
        │     supply | deduction | preparation  →  Flash workflow
        │     agent analysis                    →  monthly operating analysis
        ▼
 4  Flash workflow  (gemini-3-flash, ≤5 iterations)
        │  inject inventory snapshot into system_prompt
        │  call client.generate_content(tools=[...])
        ▼
 5  Gemini  ──►  text  /  function_call
                │
                ▼
              tool_create_supply_draft  (or _deduction_, _preparation_)
                │  parse args, normalize price_type
                ▼
              insert row in supplies / deductions / preparations
              with status = 'draft'
        │
 6  Backend  ──►  AgentMessageResponse(text, actions=[
        │           AgentAction(type=callback, label='Confirm',
        │                       data='confirm_supply:<uuid>'),
        │           AgentAction(type=callback, label='Delete',  ...),
        │         ])
        ▼
 7  Bot  ──►  inline keyboard rendered from actions
        │
 8  User taps Confirm
        │  POST /api/supplies/{id}/confirm
        ▼
 9  Backend
        │  recompute stock_levels  +  WAC
        │  insert stock_history row
        │  snapshot cost_per_unit on lines
        ▼
10  Done  ·  status = 'confirmed'  ·  audit_log written
```

**Two-phase by design.** The AI workflow creates a draft (no side effects). Confirm is a separate POST that updates stock_levels and writes stock_history. AI never touches live stock; the human button does.

**Backend owns action meaning.** The bot is a thin Telegram adapter: it uploads media, shows progress, renders backend-provided actions and forwards callbacks. Business meaning stays in the backend.

**Explicit lanes keep the MVP predictable.** The operator picks Supply, Deduction, Preparation or Agent before sending messy input. That removes false intent classification from the critical stock path; an auto-router can sit above the same dispatcher later.

### Component layout

```text
   Inputs              docker-compose  (4 services)              External
   ──────              ─────────────────────────────              ────────
   Telegram chat ────► bot  (Aiogram 3.4)                        Gemini API
                            │  POST /api/agent/message            google-genai
                            │  X-Telegram-User-Id                      ▲
                            ▼                                          │
   Telegram WebApp     ┌──────────────────────────────────┐            │
   ─initData──────────►│  backend  (FastAPI · async)      │────────────┘
   browser  ─JWT──────►│  9 routers · 18 tables · 9 mods  │
                       │  GeminiClient + token_usage      │
                       └────┬───────────┬─────────────────┘
                            │           │
                            ▼           ▼
                      postgres 16    receipts_data
                      (asyncpg WAL)  shared volume:
                                     bot writes · backend reads

                            ▲ HTTP /api/*
                            │
                       frontend  (Vite · React 18 SPA · :5173)
                       react-i18next 5 langs × 10 ns
                       @tanstack/react-query
```

**Bot is HTTP only.** Bot opens a long-poll loop to Telegram and POSTs every event into backend. No direct DB access from bot — keeps the bot stateless across restarts.

**Photos shared via FS, not API.** Bot writes uploaded photos into the receipts_data volume; backend reads from the same path. Avoids re-uploading multi-MB files between containers.

**Frontend is dual-mode.** Same Vite SPA mounts inside the Telegram WebApp (initData header) and as a plain browser app (deep-link JWT). One bundle, two auth paths in lib/auth.ts; the rest of the app cannot tell them apart.

### Domain core · 18 tables in 9 modules

```text
INVENTORY                            SUPPLIES
─────────                            ────────
inventory_items                      suppliers
  · status: active|archived          supplies  (status: draft|confirmed)
       │                               └── supply_lines
       ▼                                     · qty · price_per_unit
  stock_levels                              · expiry_date
    · qty · avg_cost  ◄────── WAC update on confirm
       │
       ▼
  stock_history  (audit log of every movement)


PREPARATIONS                         DEDUCTIONS
────────────                         ──────────
prep_recipes                         deductions  (status: draft|confirmed)
  · default_multiplier                 └── deduction_items
  · portion_size · portion_unit              · type: dish | item
  └── prep_recipe_ingredients               └── deduction_lines
                                                  · cost_per_unit
preparations                                       (snapshot at confirm)
  · multiplier  (e.g. 3× base broth)
  · cost = sum(ingredient × avg_cost) / output_qty   ← cascade WAC


MENU                                 AI · CHAT
────                                 ─────────
menu_items  (with archived flag)     chat_sessions
  └── menu_item_ingredients          chat_messages  (last 40 in context)
                                     user_settings
                                     token_usage
                                       · workflow_type · model
                                       · input/output/thinking_tokens
                                       · cost_usd


AUDIT
─────
audit_logs  ·  receipt_images  (shared FS path with bot)
```

**WAC lives in stock_levels.avg_cost.** Each supply confirm recomputes (old_qty*old_avg + new_qty*new_price) / total_qty into stock.avg_cost. Cascades through preparation: prep cost = sum(ingredient_qty * avg_cost) / output_qty, which itself enters output_stock.avg_cost via the same WAC formula.

**Cost snapshot on deduction confirm.** When a deduction confirms, deduction_lines.cost_per_unit = stock.avg_cost at that moment. Later supplies at different prices do not rewrite historical deductions.

**Dual-mode deductions.** deduction_items.type = dish (expand recipe; ingredients become lines) or item (direct stock line). One deduction can mix both — covers the case where the dish was served and the cook ate two extras off the side.

## Key engineering decisions

### 01 · Two-tier Gemini-3 with explicit workflow lanes

**Decision.** Two Gemini-3 tiers handle two distinct job shapes. Cheap Flash (gemini-3-flash-preview, thinking_budget=0, ≤5 iterations) runs stateless workflow tool-loops — one of {create_supply_draft, create_deduction_draft, create_preparation}. Pro-style analysis handles broader operating questions over the 30-day history, low-stock watchlist, dashboard metrics and next-order list. The lane is explicit in the MVP — Supply / Deduction / Preparation / Agent buttons in the bot. Per-call usage is parsed from response.usage_metadata and persisted in token_usage with workflow_type so Settings can break down spend by tier and workflow.

**Why.** A single-tier setup on Pro costs roughly 4× per token and pulls full chat-history overhead onto every receipt-photo dispatch. A single-tier setup on Flash fits narrow tool loops but not broad operating analysis. Splitting by job shape — narrow operation draft vs. broader business analysis — matches the real cost/benefit envelope of Gemini-3 today. Explicit lanes also reduce false intent classification while the core MVP proves the stock-operation path.

**Cost.** Two prompt families to maintain. Tool declarations partially duplicate across narrow workflows and broad analysis. Explicit lane choice is a UX cost on the bot side: one extra button before intake. An auto-intent router can remove that later, but it should sit on top of the same cost and workflow telemetry rather than replacing the tier split.

### 02 · Draft → Confirm on every mutating operation

**Decision.** Supplies, deductions and preparations are two-phase. The first POST inserts a row with status = draft — no side effects on stock. A second POST to /confirm pessimistically locks the relevant stock_levels rows, recomputes WAC, writes a row to stock_history per moved item, snapshots cost_per_unit on the deduction lines, and flips status to confirmed. AI never touches live stock; the human confirms via an inline button.

**Why.** An AI agent that mutates inventory directly is one bad transcription away from corrupting weeks of stock data. Two-phase makes every model proposal reversible (delete the draft) and inspectable (open the WebApp, edit the line, then confirm). The same pattern gives the human operator the same affordance — start a draft on the bot, finish it on the WebApp.

**Cost.** Every mutating surface is two endpoints (`/...` and `/.../confirm`). Stock has to accept draft rows that it ignores in totals. UX has to communicate that a freshly-created supply is not yet live. Concurrent confirms on the same stock_level need a lock — added a pessimistic `SELECT FOR UPDATE` on every confirm-path stock read (supplies + deductions + preparation cascade); read-only stock queries stay lock-free.

### 03 · Server-driven review actions, not bot-side business logic

**Decision.** Backend emits AgentAction inside AgentMessageResponse — each action carries a type (callback or webapp), a label, and a data string. Telegram renders Review, Confirm and Cancel from that list; the backend owns what confirm_supply:<uuid> means and which WebApp screen Review opens. The same contract works on Telegram, in the chat pane of the WebApp fallback, and in any text channel that supports tap-a-button.

**Why.** A bot that owns business actions becomes a parallel UI codebase: every new draft type, every review path, every workflow needs a bot-side handler change. Keeping the bot as a transport adapter means new stock flows ship in backend only. The operator still gets a rich Telegram surface — progress edits, result summary, Review/Confirm/Cancel — while business meaning remains centralized.

**Cost.** Backend has to know about Telegram-specific limits (callback_data ≤ 64 bytes; some types are inline-only). Buttons are dispatch-only — they do not carry inline forms. Anything richer than confirm/delete (editing line quantities, for example) has to fall through to the WebApp.

### 04 · Weighted Average Cost with cascade through preparations

**Decision.** Each supply confirm runs WAC over the affected stock_level: new_avg = (old_qty*old_avg + new_qty*new_price) / total_qty. Preparations apply WAC twice — first to compute the unit cost of the prep (sum(ingredient_qty * ingredient.avg_cost) / output_qty), then to merge that into the output stock_level via the same WAC formula. Deduction lines snapshot cost_per_unit = stock.avg_cost at confirm time; later price changes do not rewrite the past.

**Why.** FIFO/LIFO would require lot-level tracking — every supply line as a discrete batch with its own remaining qty — and a queue/stack pop on every deduction. For a single-restaurant kitchen with mixed ingredients (one bag of rice, not five distinguishable batches) WAC matches how the cook actually thinks about cost. Snapshots on confirm protect historical reports from later price drift.

**Cost.** WAC math hides batch-level variance — you cannot answer which exact supply this dish came from. Preparations turn one deduction into a chain of WAC computations; debugging a wrong cost means walking the chain by hand. With zero tests in the repo, every WAC change is verified manually.

## Stack

| | |
|---|---|
| **Backend** | Python · FastAPI · SQLAlchemy 2.0 Mapped (async) · asyncpg · Alembic · Pydantic v2 |
| **Frontend** | React 18 · Vite · TypeScript · Tailwind · @tanstack/react-query · react-i18next |
| **Bot** | Aiogram 3.4 (long polling) · httpx |
| **AI** | google-genai 1.61 · Gemini 3 Flash + Pro · multimodal (text/photo/voice) · cost matrix |
| **Infra** | PostgreSQL 16 · Docker Compose (4 services · 1 shared volume for receipts) |
| **Scale** | 9 backend modules · 18 tables · 13 pages · 5 langs × 10 i18n namespaces · 6 currencies · ~50 routes |

## Lessons & status

### Carry forward

- Per-call cost parsed from usage_metadata, persisted with workflow_type — Settings page slices spend by model, by workflow and by day. Wiring it before the first run is cheaper than bolting it on after the first vendor invoice; the same signal is what the smart-router should consume once it lands.
- Draft → Confirm with cost snapshot — every mutating AI output is reviewable before commit. The product safety boundary lives in the workflow shape, not in model accuracy: bad OCR can create a bad draft, but it cannot mutate live stock without human confirmation.
- Server-driven review actions — Telegram shows progress, result summaries and Review/Confirm/Cancel, while backend owns the action contract. Same contract works in Telegram, any text channel, and the WebApp chat pane. Adding a new workflow ships backend-side.
- Compact design system + hot-reload across all three containers — UI-STANDARDS.md (~600 lines) pins receipt-like cards, 28×28 square actions, 11–14 px type, Telegram CSS vars mapped to Tailwind tokens. Bind-mounts plus uvicorn --reload and vite HMR keep edit-to-visible cycles in seconds for backend, frontend, and bot.

### Would change

- Started with zero tests — acceptable for MVP speed, but workflow routing and WAC confirm paths now deserve pytest fixtures before more operators rely on them. The risky surface is small: draft creation, confirm mutation, WAC cascade and dashboard totals.
- HMAC validation for Telegram WebApp initData — hmac is imported in auth.py and never called. Trust today rests on Telegram WebApp client-side guarantees rather than server-side verification; closing it is a ~50-line addition.
- init_db() running alongside Alembic — Base.metadata.create_all() fires at startup and migrations exist beside it. Fine for solo iteration on a fresh database, becomes a foot-gun the first time someone clones the repo with neither path leading them to a clean schema. Either alembic stamp head after create_all, or drop create_all entirely.

MVP · Telegram bot + WebApp · multimodal intake live · supply, preparation, deduction, WAC and dashboard flows working end-to-end.

---

Source: https://ilyadev.xyz/cases/ai-warehouse (HTML) · /cases/ai-warehouse.md (this file)
Previous: 01 — AI CRM · Real Estate → https://ilyadev.xyz/cases/ai-crm.md
Up next: 03 — AI Video Editor → https://ilyadev.xyz/cases/ai-video-editor.md
Index: https://ilyadev.xyz/llms.txt — full case-study list
Author: Ilya Kazantsev — https://ilyadev.xyz/index.md
