Backend, distributed systems,
and agentic AI in production.
I build the parts that don’t make the demo — caching layers, multi-cloud failover, evaluation harnesses, the ~60% LLM-call reduction nobody screenshots. Currently shipping production agentic systems at Iolite Softwares.
about
I’m a software engineer with 1.5+ years full-time at Iolite Softwares and two more part-time during my B.Tech at PDEU (CS, GPA 9.21/10). My work sits where backend, distributed systems, and applied AI overlap — the parts that decide whether a product survives its first 1,000 concurrent users.
I’ve shipped a multi-tenant agentic NL→SQL chatbot on LangGraph + Gemma with a 3-tier follow-up engine and binary-encoded RBAC, an active-passive multi-cloud DR topology spanning AWS Mumbai/Singapore + Azure failover with canary failback, and an HLS streaming pipeline running ephemeral FFmpeg workers on EC2 dispatched via SQS. Tools change; the lens stays the same: how does this fail, what does it cost, and where’s the metric.
Outside the day job I’m usually breaking and re-building small agentic services, or reading Designing Data-Intensive Applications for the third time.
work
2 roles · 3.5 yrsBackend, AI infra, and platform work on a multi-tenant SaaS — primarily a LangGraph-based chatbot, the C#/ASP.NET API behind it, and the trademark-automation product line.
- 01
Agentic NL→SQL chatbot
Architected a multi-tenant agentic NL→SQL chatbot on LangGraph with locally-served Gemma. Built a 3-tier follow-up engine, a destructive-SQL gate, and a binary-encoded RBAC whitelist across 50+ modules.
~60% drop in follow-up LLM callswhy / howPicked LangGraph over plain LangChain so state, retries, and conditional routing live inside one explicit state graph rather than chained prompts — easier to reason about, easier to debug. Gemma runs on-prem because tenant schemas travel through the agent and we couldn't ship that metadata to a public API. The 3-tier follow-up engine resolves the easy half of repeat questions from a hashed-prompt + last-execution cache (L1), kicks the next slice to a tiny rewrite agent that prompts only with the diff against the previous turn (L2), and only falls through to a full re-plan (L3) when the question genuinely shifts intent. Authorization sits in front of generation, not behind it: the RBAC whitelist is encoded as packed bitmasks per (role × table × column), so checking access on a generated query is one AND, not a join.
LangGraphGemmaPythonRBACPostgres - 02
FastSQLDatabase + RAG retrieval
Built a custom FastSQLDatabase wrapper with cached INFORMATION_SCHEMA snapshots and multi-tier TTL+LRU caches. Added RAG few-shot retrieval with hybrid embedding + lexical fallback for tenant priming.
~70% faster tenant cold-startwhy / howEach new tenant was paying a multi-second schema-introspection penalty on cold queries; pre-computing a versioned schema snapshot per tenant and layering an in-process LRU on top of a cross-process TTL store collapsed that to roughly a quarter of the original. Retrieval is hybrid on purpose: dense embeddings catch paraphrase and intent, BM25 catches the exact column names and identifiers that embeddings smooth over. Few-shot beat fine-tuning here because tenant schemas evolve weekly and re-tuning is operationally wrong for that rate of change.
FastAPIVector DBHybrid retrievalTTL/LRU - 03
Hybrid distributed caching
Built ETag-based hybrid HTTP caching layered with IMemoryCache and Microsoft Garnet on the ASP.NET / C# / Angular 17 stack.
~70% DB-load reduction · ~60% faster responseswhy / howThree layers, each doing one job. Weak ETags at the edge let Angular skip large list payloads on revalidation — the cheapest cache hit is the one that never crosses the network. IMemoryCache short-circuits hot reads inside each app instance for sub-millisecond wins on N+1-style hotspots. Garnet (Microsoft's Redis-protocol drop-in, lower latency than Redis on our access pattern) is the shared L2 across the cluster so a cold instance still gets warm data. Invalidation is event-driven over a thin pub/sub bus rather than TTL-only, which keeps stale reads bounded after writes.
ASP.NETC#GarnetETagAngular - 04
Trademark automation at scale
Shipped scrapers for 200+ trademark offices with custom CAPTCHA-solving models and PDF/image+text similarity pipelines (ResNet, CLIP, RapidFuzz) running across 250+ regional journals.
200+ jurisdictions · 250+ journalswhy / howEvery jurisdiction is its own scraping problem — sessions, JS-rendered pages, IP rate limits, and a different CAPTCHA per office. Commercial CAPTCHA APIs were uneconomic at this volume so we trained per-style solvers offline. Conflict ranking blends three signals rather than picking one: RapidFuzz on normalised marks for textual proximity, ResNet feature distance on logo crops for visual similarity, and CLIP for the cross-modal cases (text mark vs logo, or vice versa). Scoring is calibrated per jurisdiction because filing standards and similarity tolerances genuinely differ.
ScrapersCVCLIPResNetRapidFuzz
selected projects
readmes on github- 01agentic-ai · production
Agentic NL→SQL Chatbot
Multi-tenant LangGraph chatbot with locally-served Gemma.
~60% LLM-call reduction · 50+ modules · binary RBACA 3-tier follow-up engine routes queries through a hashed-prompt cache (L1), a small rewrite agent that prompts only with the diff against the previous turn (L2), and a full re-plan only when intent shifts (L3). A destructive-SQL gate blocks DDL/DML on read-only tenants before generation lands. The RBAC whitelist is encoded as packed bitmasks per (role × table × column) so authorization is a single AND, not a join — checked in front of generation, not behind it.
key decisionLangGraph over plain LangChain — explicit state graph beats chained prompts when retries, branching, and human-in-the-loop all touch the same state.
LangGraphGemmaPythonPostgresRBACview readmelive demo - 02infrastructure · multi-cloud
Active-Passive Disaster Recovery
Two clouds, three regions, canary failback gated on health checks.
Cloudflare → AWS Mumbai (active) · AWS SG (hot) · Azure (failover)Cloudflare's load-balancer routes primary traffic to AWS Mumbai with hot-standby on AWS Singapore and Azure failover for MongoDB Atlas. Failback ramps 5%→25%→50%→100% gated on rolling error rate and p99 latency — fail any gate and traffic snaps back automatically. Health checks chain edge → app health → DB write probe so a single failing layer can't drag the others. Terraform owns the topology end-to-end and Jenkins runs weekly drills so the failback path doesn't bit-rot.
key decisionActive-passive over active-active — read-heavy traffic, small team, and a data tier that couldn't safely tolerate concurrent multi-region writes.
AWSAzureCloudflareTerraformJenkinsview readme - 03full-stack · led
TeamSync — Project Management Platform
Backend + integrations lead, Infosys Springboard 5.0.
Role-based access · task tracking · team workflowsLed the backend and integration tracks of a collaborative project-management platform built on Node.js, Express, React, and MongoDB with role-based access, task tracking, and end-to-end team workflows. Owned the API design, auth model, and the integration surface; coordinated four contributors across the build.
scopeBackend + integrations lead — owned API design, auth model, and the cross-team integration surface.
Node.jsExpressReactMongoDBview readmelive demo
— rest of the work lives on github.com/ShubhamPatel2305
stack
bold = comfortable in production- Python
- TypeScript
- JavaScript
- C#
- C++
- SQL
- FastAPI
- Node.js
- Express
- ASP.NET Web API
- REST
- WebSockets
- Microservices
- LangGraph
- LangChain
- RAG (hybrid)
- OpenAI / Anthropic SDKs
- Gemma (local)
- Vector DBs
- Eval (Ragas / DeepEval)
- AWS
- Azure
- Cloudflare
- Docker
- Terraform
- Jenkins
- Multi-cloud DR
- Canary deploys
- CI/CD
- PostgreSQL
- MongoDB
- SQL Server
- Redis
- Microsoft Garnet
- ETag caching
- TTL/LRU
- React
- Next.js
- Angular
- Tailwind CSS
- HTML5 video
contact
reply within ~48hLet’s talk.
Best for product-company SDE / AI engineer roles, contract or full-time. Pick the kind of conversation — the form adapts.