Course RAG Pipeline
Production-deployed agentic RAG system for UCLA MSBA students to query course materials — lecture slides, transcripts, and PDFs — using natural language. Live at tirth-courserag.duckdns.org.
Problem Statement
UCLA MSBA students juggle 4 simultaneous courses — each with lecture slides, transcripts, homework deadlines, and project deliverables scattered across a shared Google Drive. Manual search is slow and error-prone, especially for deadline-critical queries ("when is HW3 due?"). The challenge: build a production-grade agentic system that can answer natural-language questions over course PDFs, verify its own deadline answers to prevent hallucination, let students upload new files safely without polluting the vector store, and explain exactly which source chunks it used — all at zero ongoing infrastructure cost.
Approach & Methodology
LangGraph Agentic Orchestration
Designed a 13-node LangGraph graph with a priority-based router that first checks for pending clarifications, then tries regex pattern matching (e.g., "why did you", "where did that come from"), and only falls back to LLM classification if patterns fail — saving ~200 tokens per source-explanation follow-up. The router classifies each query into one of five types and routes to the appropriate branch.
Self-Verifying Deadline Extraction
Deadline queries carry the highest accuracy requirement — a wrong date is worse than no answer. After extracting a deadline with the LLM, the system immediately re-queries ChromaDB with a rephrased version of the search and cross-references the extracted date against the new results. If two chunks give different dates for the same assignment, the response surfaces both and flags the discrepancy with a confidence indicator.
Deadline-Boosted Retrieval
Chunks containing deadline keywords (due, deadline, submit, homework, exam, etc.) are tagged with a contains_deadline metadata flag during ingestion. The ChromaService.query_with_deadline_boost() method merges results from a deadline-filtered query with results from a general query, deduplicates, and re-ranks — ensuring deadline-containing chunks appear at the top even when the semantic similarity score is not highest (deadline keywords often appear as side notes with lower embedding similarity).
Human-in-the-Loop Upload Approval
When a user uploads a file, the LangGraph graph pauses at a human_approval_gate node using LangGraph's interrupt_before mechanism with a SQLite checkpointer. The UI shows an approval dialog with the LLM's proposed Drive folder path and its reasoning. The user can approve, modify the path, or reject. Only approved uploads are chunked, embedded with OpenAI text-embedding-3-small, and stored in ChromaDB — preventing mis-categorised files from polluting the vector store.
Vision Fallback for Scanned PDFs
PyMuPDF text extraction returns fewer than 30 characters for pages that are images (scanned slides, photographed documents). In this case, the PDFProcessor sends the page image to OpenAI Vision (GPT-4o-mini) with a strict verbatim-transcription prompt. To prevent API timeouts from hanging the pipeline: 25-second timeout per page, maximum 2 concurrent vision calls, and an early bail-out after 5 consecutive failures on a single file.
Deployment & Infrastructure
Fully containerised with Docker Compose on an Oracle Cloud Always Free ARM VPS (4 CPU, 24 GB RAM). Caddy handles reverse proxying and automatic HTTPS via Let's Encrypt. DuckDNS provides the free dynamic domain (with a cron job pinging every 30 minutes to prevent expiry). Total infrastructure cost: ~$1–4/month — essentially just LLM API usage. Solved production issues including Oracle iptables blocking ports 80/443 (fixed via iptables -I to insert ACCEPT before the blanket REJECT rule) and Google OAuth credentials not accessible inside the running container (fixed via docker cp).
Architecture
START
↓
input_handler → loads chat history from SQLite
↓
router → priority: pending clarification → regex → LLM
↓
[CONDITIONAL ROUTING by query_type]
deadline branch:
retriever → ChromaDB deadline-boosted search (k=5)
deadline_extractor → LLM: {assignment, date, time, confidence}
deadline_verifier → re-query + cross-reference → flag conflicts
response_output
summary branch:
retriever → ChromaDB search (k=10)
summary_redirector → return Drive links + page numbers
response_output
upload branch:
upload_handler → extract file content preview
location_classifier → LLM proposes Drive folder path
human_approval_gate [INTERRUPT] ← user approves/edits/rejects
upload_executor → Drive upload + chunk + embed + ChromaDB
response_output
general branch:
retriever → ChromaDB search (k=7)
general_responder → LLM answer with cited sources
response_output
source_explanation branch:
source_explainer → scan session history → return raw chunks
response_output
END
Services: LLMService (Claude Haiku → GPT-4o-mini fallback)
EmbeddingService (text-embedding-3-small, 1536-dim)
ChromaService (metadata filter: course_id + quarter, fallback on zero results)
DriveService (Google Drive API, OAuth 2.0)
PDFProcessor (PyMuPDF + Vision fallback for scanned pages)Results & Impact
13
LangGraph Nodes
Plus 1 interrupt point for human-in-the-loop
5
Query Types
Deadline, summary, upload, general, source explanation
~$1–4/mo
Infra Cost
Oracle Cloud Free Tier + DuckDNS + Let's Encrypt
4
Courses Indexed
MSA408, MSA409, MSA410, MSA413
9
Production Bugs Solved
iptables, domain expiry, Docker creds, ChromaDB filters, and more
33
Test Count
20 Phase 1 + 13 Phase 2 tests
Live Demo
Demo credentials — use viewer / hi-how-are-you to log in. The viewer account has read-only access to course Q&A and deadline queries.
Course RAG — Live Demo
Click to load the interactive RAG system
Deadline Queries
Ask "When is HW3 due?" — answer is extracted, re-verified, and cited
File Upload
Drag in a PDF, approve the LLM-proposed Drive folder, and it's instantly queryable
Source Explanation
Ask "Why did you give me that?" and see the exact raw chunks used