01engineer

Backend, distributed systems,
and agentic AI in production.

I build the parts that don’t make the demo — caching layers, multi-cloud failover, evaluation harnesses, the ~60% LLM-call reduction nobody screenshots. Currently shipping production agentic systems at Iolite Softwares.

open to roles · sde / ai engineerahmedabad, india · open to remote / relocate

~60%

follow-up LLM calls cut

NL→SQL agent

~70%

DB load reduction

hybrid cache layer

200+

jurisdictions scraped

trademark pipeline

1000+

users on HLS pipeline

TTFA Academy

about

I’m a software engineer with 1.5+ years full-time at Iolite Softwares and two more part-time during my B.Tech at PDEU (CS, GPA 9.21/10). My work sits where backend, distributed systems, and applied AI overlap — the parts that decide whether a product survives its first 1,000 concurrent users.

I’ve shipped a multi-tenant agentic NL→SQL chatbot on LangGraph + Gemma with a 3-tier follow-up engine and binary-encoded RBAC, an active-passive multi-cloud DR topology spanning AWS Mumbai/Singapore + Azure failover with canary failback, and an HLS streaming pipeline running ephemeral FFmpeg workers on EC2 dispatched via SQS. Tools change; the lens stays the same: how does this fail, what does it cost, and where’s the metric.

Outside the day job I’m usually breaking and re-building small agentic services, or reading Designing Data-Intensive Applications for the third time.

work

2 roles · 3.5 yrs

Backend, AI infra, and platform work on a multi-tenant SaaS — primarily a LangGraph-based chatbot, the C#/ASP.NET API behind it, and the trademark-automation product line.
- 01
  Agentic NL→SQL chatbot
  Architected a multi-tenant agentic NL→SQL chatbot on LangGraph with locally-served Gemma. Built a 3-tier follow-up engine, a destructive-SQL gate, and a binary-encoded RBAC whitelist across 50+ modules.
  ~60% drop in follow-up LLM calls
  why / how
  Picked LangGraph over plain LangChain so state, retries, and conditional routing live inside one explicit state graph rather than chained prompts — easier to reason about, easier to debug. Gemma runs on-prem because tenant schemas travel through the agent and we couldn't ship that metadata to a public API. The 3-tier follow-up engine resolves the easy half of repeat questions from a hashed-prompt + last-execution cache (L1), kicks the next slice to a tiny rewrite agent that prompts only with the diff against the previous turn (L2), and only falls through to a full re-plan (L3) when the question genuinely shifts intent. Authorization sits in front of generation, not behind it: the RBAC whitelist is encoded as packed bitmasks per (role × table × column), so checking access on a generated query is one AND, not a join.
  LangGraphGemmaPythonRBACPostgres
- 02
  FastSQLDatabase + RAG retrieval
  Built a custom FastSQLDatabase wrapper with cached INFORMATION_SCHEMA snapshots and multi-tier TTL+LRU caches. Added RAG few-shot retrieval with hybrid embedding + lexical fallback for tenant priming.
  ~70% faster tenant cold-start
  why / how
  Each new tenant was paying a multi-second schema-introspection penalty on cold queries; pre-computing a versioned schema snapshot per tenant and layering an in-process LRU on top of a cross-process TTL store collapsed that to roughly a quarter of the original. Retrieval is hybrid on purpose: dense embeddings catch paraphrase and intent, BM25 catches the exact column names and identifiers that embeddings smooth over. Few-shot beat fine-tuning here because tenant schemas evolve weekly and re-tuning is operationally wrong for that rate of change.
  FastAPIVector DBHybrid retrievalTTL/LRU
- 03
  Hybrid distributed caching
  Built ETag-based hybrid HTTP caching layered with IMemoryCache and Microsoft Garnet on the ASP.NET / C# / Angular 17 stack.
  ~70% DB-load reduction · ~60% faster responses
  why / how
  Three layers, each doing one job. Weak ETags at the edge let Angular skip large list payloads on revalidation — the cheapest cache hit is the one that never crosses the network. IMemoryCache short-circuits hot reads inside each app instance for sub-millisecond wins on N+1-style hotspots. Garnet (Microsoft's Redis-protocol drop-in, lower latency than Redis on our access pattern) is the shared L2 across the cluster so a cold instance still gets warm data. Invalidation is event-driven over a thin pub/sub bus rather than TTL-only, which keeps stale reads bounded after writes.
  ASP.NETC#GarnetETagAngular
- 04
  Trademark automation at scale
  Shipped scrapers for 200+ trademark offices with custom CAPTCHA-solving models and PDF/image+text similarity pipelines (ResNet, CLIP, RapidFuzz) running across 250+ regional journals.
  200+ jurisdictions · 250+ journals
  why / how
  Every jurisdiction is its own scraping problem — sessions, JS-rendered pages, IP rate limits, and a different CAPTCHA per office. Commercial CAPTCHA APIs were uneconomic at this volume so we trained per-style solvers offline. Conflict ranking blends three signals rather than picking one: RapidFuzz on normalised marks for textual proximity, ResNet feature distance on logo crops for visual similarity, and CLIP for the cross-modal cases (text mark vs logo, or vice versa). Scoring is calibrated per jurisdiction because filing standards and similarity tolerances genuinely differ.
  ScrapersCVCLIPResNetRapidFuzz

selected projects

readmes on github

— rest of the work lives on github.com/ShubhamPatel2305

stack

bold = comfortable in production

01languages

Python
TypeScript
JavaScript
C#
C++
SQL

02backend / api

FastAPI
Node.js
Express
ASP.NET Web API
REST
WebSockets
Microservices

03ai / llms

LangGraph
LangChain
RAG (hybrid)
OpenAI / Anthropic SDKs
Gemma (local)
Vector DBs
Eval (Ragas / DeepEval)

04cloud / devops

AWS
Azure
Cloudflare
Docker
Terraform
Jenkins
Multi-cloud DR
Canary deploys
CI/CD

05data / cache

PostgreSQL
MongoDB
SQL Server
Redis
Microsoft Garnet
ETag caching
TTL/LRU

06frontend (working)

React
Next.js
Angular
Tailwind CSS
HTML5 video

currently picking upAWS Solutions Architect Associate · MCP servers · structured-output eval harnesses · ML system design

contact

reply within ~48h

Let’s talk.

Best for product-company SDE / AI engineer roles, contract or full-time. Pick the kind of conversation — the form adapts.

or just — shubhamcp23@gmail.com github linkedin

Backend, distributed systems,
and agentic AI in production.

about

work

Agentic NL→SQL chatbot

FastSQLDatabase + RAG retrieval

Hybrid distributed caching

Trademark automation at scale

selected projects

Agentic NL→SQL Chatbot

Active-Passive Disaster Recovery

TeamSync — Project Management Platform

stack

contact

Let’s talk.

Backend, distributed systems,and agentic AI in production.

about

work

Agentic NL→SQL chatbot

FastSQLDatabase + RAG retrieval

Hybrid distributed caching

Trademark automation at scale

selected projects

Agentic NL→SQL Chatbot

Active-Passive Disaster Recovery

TeamSync — Project Management Platform

stack

contact

Let’s talk.

Backend, distributed systems,
and agentic AI in production.