About Projects Experience Contact
Back to Projects

AI systems case study / 72-hour sprint / selected internship assessment

RetailOS Lite

A local-first, production-shaped retail execution platform: reps submit shelf evidence, workers run fraud and AI analysis, supervisors get compliance intelligence, and a RAG assistant answers operational questions from previous visits.

55h of focused build time within a 72h window
3 deployable services: web, worker, AI
9+ subsystems integrated end-to-end
5 fraud signals with explicit reasons
RetailOS Lite supervisor overview dashboard with visits, compliance, POSM, fraud, and recent visit metrics
Supervisor overview: field execution metrics, compliance health, missing POSM, fraud detections, visit quality, and recent visits in one operational surface.

Why This Project Exists

RetailOS Lite was built for a rapid internship assessment: design and ship a mini AI-native retail execution workflow in 72 hours. The mandatory scope sounded simple on paper: store visits, shelf image upload, AI shelf analysis, supervisor summary, dashboard, and at least one fraud signal. The real challenge was making those pieces feel like a believable operational system instead of a stitched-together demo.

I treated the assignment as a systems problem. The rep flow had to stay fast even when AI was slow. Fraud had to be explainable. Outlet creation had to survive messy shop names in a dense Bangladesh retail environment. RAG had to answer list questions from exact data rather than vibes. Observability had to explain why a visit was stuck. The result was a compact but production-shaped platform with clear service boundaries, async processing, and operational recovery paths.

Outcome: The system helped our team secure the internship offer. The feedback was not just that the app worked, but that the architecture showed speed, product thinking, AI-native engineering, and operational maturity.

My Role

I worked across the backend, AI pipeline, worker architecture, fraud detection, RAG assistant, observability, deployment, and production hardening. The frontend was collaborative, but my main contribution was turning the product brief into a system that could be debugged, extended, and defended in front of senior engineers.

Architecture

Split the system into Next.js APIs, a BullMQ worker, a FastAPI AI service, Postgres, Redis, object storage, and observability services.

AI Pipeline

Separated YOLO detection, OpenAI visual reasoning, deterministic scoring, report generation, and vector indexing into explicit stages.

Operational Hardening

Added DLQs, admin replay, rate limits, API-key service auth, pre-signed uploads, actor-aware EventLog records, and Dockerized local infrastructure.

System Architecture

The most important design decision was to keep the request path thin. Next.js owns user-facing workflows and orchestration, but all expensive or failure-prone work runs in a worker or AI service. That keeps rep submissions responsive and makes every heavy stage observable.

flowchart TB
  Rep[Rep PWA] --> Web[Next.js App Router]
  Sup[Supervisor Console] --> Web

  Web --> Auth[Auth.js + RBAC]
  Web --> DB[(PostgreSQL + Prisma)]
  Web --> Obj[MinIO / S3-compatible storage]
  Web --> Redis[(Redis / BullMQ)]
  Web --> EventLog[(EventLog)]

  Redis --> Worker[Node BullMQ Worker]
  Worker --> DB
  Worker --> EventLog
  Worker --> AI[FastAPI AI Service]
  Worker --> WhatsApp[Twilio WhatsApp Alerts]

  AI --> YOLO[YOLO Shelf Detection]
  AI --> Vision[OpenAI POSM + Summary]
  AI --> Score[Deterministic Compliance Engine]
  AI --> Pinecone[(Pinecone)]
  YOLO -. optional .-> Modal[Modal GPU]

  Web --> Assistant[RAG Assistant API]
  Assistant --> DB
  Assistant --> AI

  Web --> Sentry[Sentry]
  Worker --> Sentry
  AI --> Sentry
  Worker --> Prom[Prometheus]
  AI --> Prom
  Web --> Prom
  Prom --> Grafana[Grafana / Loki / Tempo]
                    

Core Workflow

The workflow is intentionally shaped like a real field execution product: a rep can work under unreliable network conditions, the worker does the expensive analysis, and the supervisor sees an inspectable decision trail rather than a magic AI label.

  1. Rep creates a visit: outlet name/search, GPS, timestamp, notes, and one shelf image.
  2. Upload is stored: local storage for dev or pre-signed MinIO/S3 upload for production-shaped demos.
  3. Visit is queued: Next.js returns quickly with ANALYZING; BullMQ owns retries and backoff.
  4. Worker runs fraud checks: duplicate hash, perceptual hash, GPS mismatch, timestamp anomaly, and EXIF GPS/time checks.
  5. AI service analyzes shelf evidence: YOLO detects Olympic Foodie Noodles and competitor products; OpenAI audits POSM and shelf quality.
  6. Compliance is scored: deterministic rules produce score, status, reasons, and recommended action.
  7. Report is indexed: the worker writes a RAG-ready VisitReport and queues embedding into Pinecone.
  8. Supervisor reviews: dashboard, visit logs, outlet verification, ops timelines, and assistant answers all read from persisted state.

Product Walkthrough

The screenshots below show the parts that matter most in a live evaluation: a fast rep workflow, inspectable AI reports, governed outlet creation, assistant-backed insights, and operational telemetry for the async pipeline.

Reporter visit analysis page showing shelf overlay, AI supervisor summary, share of shelf, POSM details, and fraud status
Rep analysis view. The rep sees the generated shelf overlay, supervisor summary, share-of-shelf score, POSM notes, and fraud result after the worker completes analysis.
Supervisor visit report detail with YOLO overlay, compliance reasons, POSM details, and fraud signals
Visit report inspection. Supervisors can inspect the raw shelf evidence, YOLO overlay, compliance reasons, POSM audit, fraud signals, and generated recommendation.
Supervisor outlet verification queue showing pending submissions, duplicate hints, and master outlet list
Outlet governance. Typed shop names are routed through a review queue with duplicate hints, approve/reject actions, merge semantics, and a master outlet list.
Supervisor AI insights page with chatbot interface and operational query answers
AI insights. The assistant answers operational questions from exact database context first, then enriches answers with Pinecone semantic report recall.
Supervisor ops control room showing queue indicators, latency chips, failed jobs, and processing timelines
Ops control room. The internal observability page exposes queue health, failed jobs, latency chips, assistant events, and per-visit processing timelines.
Grafana dashboard with RetailOS queue, worker, AI, and service telemetry
Grafana metrics. The local LGTM stack gives external telemetry for queue depth, job outcomes, worker behavior, and AI-service latency.
Loki logs view showing structured RetailOS operational logs
Structured logs. Loki can inspect JSON logs across web, worker, and AI service using correlation IDs and workflow stages.

Engineering Deep Dives

1. Async AI Pipeline: no AI in the request path

Problem: YOLO inference, OpenAI vision, fraud checks, report generation, and embedding can take seconds and occasionally fail. If that work runs inside a Next.js API request, the rep waits, serverless timeouts become likely, and retries become unsafe.

Implementation: visit submission enqueues an analyze_visit BullMQ job and returns immediately. The worker owns the full pipeline: fraud checks, AI service call, persistence, report generation, WhatsApp alerting, and embedding enqueue. Jobs retry with exponential backoff and terminal failures are copied into DLQs.

Why it matters: this turns AI analysis into durable operational work. A failed OpenAI call or weird image format does not strand the user. Admins can inspect DLQ payloads and replay jobs without asking the rep to revisit the store.

2. YOLO + OpenAI: model boundaries instead of model soup

Problem: the YOLO model was trained for product and competitor detection, not POSM. OpenAI can reason about POSM and shelf quality, but it should not be trusted as a precise counter in dense shelf images.

Implementation: YOLO handles grounded detections: product count, competitor count, bounding boxes, overlay images, and share-of-shelf area. OpenAI handles untrained visual reasoning: Olympic/Foodie POSM, shelf neatness, competitor pressure, and supervisor summaries. The compliance score remains deterministic and inspectable.

Tradeoff: this gives a cleaner story than "we used AI everywhere." Each model has a job, and the final score is reproducible because rules, not LLM mood, decide the compliance band.

3. Fraud detection: multiple weak signals, clear reasons

Problem: fraud in field execution is messy. A rep might reuse a photo, submit from the wrong place, delay sync, or upload an image with mismatched EXIF metadata. One aggressive detector would create false positives.

Implementation: the worker calculates five signals: exact SHA-256 duplicate, perceptual dHash duplicate, GPS mismatch, timestamp anomaly, and EXIF GPS/time mismatch. Missing EXIF is not treated as fraud because many phones or upload paths strip metadata.

Result: supervisors see a risk label plus concrete reasons. The system avoids a black-box "fraud probability" and instead persists relational FraudSignal rows that can be counted, audited, and queried by the assistant.

4. Outlet master data: governed identity, not CRUD

Problem: in small local shops, a dropdown of known outlets is brittle. But if reps freely type names, the database fills with duplicates like Maa Store, Maa store, and Maa Dokan.

Implementation: reps submit names and GPS quickly, but they do not directly create verified canonical outlets. The system normalizes names, uses pg_trgm plus bounded geo prefiltering, scores name and distance, creates aliases, and routes ambiguous matches into a supervisor review queue.

Merge semantics: duplicate merge is non-destructive. Existing visits and reports are retargeted to the canonical outlet, source outlets are marked rejected, aliases are preserved, and affected Pinecone vectors are reindexed so the chatbot stays consistent.

5. RAG assistant: exact database context before vectors

Problem: questions like "which outlets have fraud signals?" or "which outlets are failing compliance?" are exact operational queries. Vector search alone can retrieve semantically similar visits, but it cannot prove a list is complete.

Implementation: the assistant first builds exact Prisma context from VisitReport, Visit, Outlet, AIResult, and FraudSignal. Pinecone semantic retrieval is used as supplemental context for trend and narrative questions. The LLM is instructed to prefer exact context for list/count/fraud/POSM/compliance answers.

Result: the chatbot behaves like operational tooling, not a loose demo bot. It can cite visit IDs and outlet names and avoid marking an outlet fraudulent just because a semantic match looked related.

6. Offline sync: field workflow survives bad networks

Problem: retail reps often work in shops where mobile connectivity is inconsistent. If the submission flow hard-depends on the network, the whole product fails in the exact environment it targets.

Implementation: visit drafts and image blobs are queued in IndexedDB. TanStack Query coordinates retry and sync when the browser comes back online. Client-generated IDs and idempotency keys prevent duplicated visits during retry.

Design detail: when offline, outlet resolution stores the submitted outlet candidate and lets the server resolve it during sync. That keeps the rep unblocked without pretending the client can do authoritative master-data matching offline.

7. Observability: built for "why is this stuck?"

Problem: async systems fail in boring but expensive ways: Redis backlog, stuck workers, slow OpenAI calls, failed embeddings, or a visit sitting at ANALYZING with no explanation.

Implementation: I added four layers of telemetry: product-native EventLog timelines, structured JSON logs, Prometheus metrics, and Sentry error capture. Docker Compose brings up Grafana, Loki, Tempo, and Prometheus locally, but the internal /supervisor/ops page is the demo-safe operational view.

Result: the supervisor ops page shows queue health, failed jobs, assistant events, latency chips, recent failures, and collapsible per-visit processing timelines reconstructed from correlation IDs.

8. Security and production hardening after the sprint

Problem: a sprint demo can work and still be risky: local file uploads, exposed AI service endpoints, unbounded assistant usage, weak recovery paths, and no clean story for serverless deployment.

Implementation: after the demo, I closed the P0/P1 gaps: MinIO/S3 pre-signed uploads, AI service API keys, Redis-backed rate limits, split Prisma pooled/direct DB URLs, admin-only DLQ replay, embedding DLQs, actor metadata on EventLog records, and operational archive tooling.

Why it matters: these are not flashy UI features, but they are the engineering details that turn "it works on my laptop" into "this could survive a pilot."

Runtime Pipeline

sequenceDiagram
  autonumber
  actor Rep
  participant Web as Next.js API
  participant DB as PostgreSQL
  participant Queue as BullMQ
  participant Worker
  participant AI as FastAPI AI Service
  participant Vec as Pinecone
  actor Sup as Supervisor

  Rep->>Web: Submit outlet, GPS, notes, image
  Web->>DB: Persist visit and upload metadata
  Web->>Queue: Enqueue analyze_visit
  Web-->>Rep: Return ANALYZING
  Queue->>Worker: Process analyze_visit
  Worker->>Worker: Fraud checks
  Worker->>AI: Analyze shelf image
  AI->>AI: YOLO + POSM + compliance
  AI-->>Worker: Detections, score, summary
  Worker->>DB: Save AIResult, FraudSignal, VisitReport, EventLog
  Worker->>Queue: Enqueue embed_visit_report
  Worker->>Vec: Upsert visit-report vector
  Sup->>Web: Review dashboard, visit detail, assistant, ops timeline
                    

What I Would Defend In A Code Review

Use queues for AI

The rep should not pay the latency cost of YOLO, OpenAI, Pinecone, or WhatsApp. Durable jobs also give retries, DLQs, and measurable backlog.

Keep scoring deterministic

LLMs can describe shelf quality, but supervisors need consistent compliance scores with visible reasons. Rules make the score auditable.

Use exact context for operational RAG

Vectors are useful for similarity and trends. They should not be the source of truth for lists, counts, fraud, compliance, or missing POSM.

Do not hard-delete outlet duplicates

Retail identity changes over time. Merge should preserve history, aliases, reports, and vectors instead of throwing data away.

Representative Code Paths

If I were reviewing this project with an engineer, these are the files I would start with. They show the architecture more clearly than a screenshot can.

  • worker/src/jobs/analyzeVisit.ts - full async analysis lifecycle
  • worker/src/services/fraud.ts - fraud signal generation and thresholds
  • ai_service/app/main.py - AI service API surface and protected endpoints
  • ai_service/app/compliance.py - deterministic compliance scoring
  • lib/outlets.ts - outlet matching, aliases, and merge semantics
  • lib/assistant.ts - exact context builder for RAG answers
  • app/api/ops/route.ts - queue health and timeline aggregation
  • lib/storage.ts - local/S3-compatible upload abstraction

Tradeoffs And Honest Limits

The system is intentionally local-first for demo reliability, not cloud-deployed yet. YOLO accuracy depends on the small training dataset, so OpenAI is used for POSM and shelf-quality reasoning rather than pretending YOLO can solve classes it never learned. Blur detection was deliberately skipped because it would punish low-end phones and casual hand movement without a more careful image-quality pipeline.

The next production steps would be managed cloud infrastructure, edge/WAF rate limits, scheduled retention jobs, image compression/thumbnails, stronger territory-level authorization, and a proper CI/CD path. The current architecture isolates those changes: storage, AI inference, queueing, and observability are already behind explicit boundaries.

What This Demonstrates

RetailOS Lite is the project where I most clearly demonstrated the kind of engineering I want to keep doing: moving quickly without making a toy, using AI as a system component instead of a buzzword, and caring about the operational details that make software trustworthy after the demo ends.