Shreyas Kapse | AI Engineer

AI Engineer Building LLM Systems, RAG Pipelines, and Agentic Workflows

Shipping production AI on top of battle-tested backend infrastructure

Pune, India Open to opportunities B.Tech, VIT Pune (8.6 CGPA)
Shreyas Kapse

AI Engineer with a production backend foundation — building LLM-powered systems using LangChain, LangGraph, and RAG pipelines on top of scalable Java/Spring Boot infrastructure.

Hands-on with agentic workflows, vector databases, hybrid retrieval, and LLM evaluation (RAGAS, LangSmith). Currently at Predii extending LLM retrieval systems and building AI-driven data pipelines.

My backend engineering background — microservices, event-driven systems, async processing — gives me an edge in shipping AI that actually works in production, not just in notebooks.

Technologies & Tools

AI / ML

LangChain, LangGraph, RAG Pipelines, Qdrant, BM25, Vector Databases, LangSmith, RAGAS, Pydantic, Streamlit

LLM Providers

Ollama, Google Gemini, HuggingFace

Programming Languages

Python, Java, JavaScript, TypeScript, SQL

Backend & Frameworks

Spring Boot, FastAPI, Node.js, NestJS, Microservices Architecture

Databases & Caching

PostgreSQL, MongoDB, MySQL, Redis, Qdrant, Azure Cosmos DB

Messaging & Streaming

RabbitMQ, Apache Kafka, Event-Driven Architecture

DevOps & Tools

Git, Docker, CI/CD, JMeter, Load Testing

Testing

JUnit, Mockito, Integration Testing

Professional Journey

Software Development Engineer

Predii, Pune

Nov 2024 – Present
  • Extended the LLM-based retrieval system by integrating Azure Cosmos DB as a vector store and migrating retrieval calls to async execution, reducing response latency for AI-driven workflows.
  • Built a Python-based JMeter automation tool that generates .jmx test plans from a YAML config (endpoint, load, threads, rounds), executes load tests, logs request/response pairs, and produces structured HTML reports.
  • Extended a Java/RabbitMQ data processing pipeline parsing raw repair order data (JSON, CSV, TSV, MongoDB) into structured output for LLM and analytics systems; contributed a Date-Time enrichment plugin, REST plugin, and Precedence plugin for entity extraction.
  • Built a Git automation tool to streamline multi-repo release workflows and contributed to Java 13 → 21/25 upgrade across core services.

Software Development Engineer Intern

LogiQuad Solutions, Pune

Jan 2024 – Jun 2024
  • Designed and implemented RESTful APIs using Spring Boot to handle concurrent requests with low-latency responses under production load conditions.
  • Integrated Redis caching layer to reduce repeated database hits, improving response times for read-heavy operations.
  • Built a JUnit/Mockito testing suite achieving 85%+ code coverage (measured via IntelliJ Coverage), reducing production defects.
  • Delivered 10+ cross-platform features in React Native, improving mobile app stability and reducing crash rates.

Research Intern

VIT Pune

Jul 2023 – Dec 2023
  • Engineered Android app using Kotlin, MVVM, and Firebase, securing a $500 research grant.
  • Conducted user behavior and UX research to improve app usability, enhancing navigation for new users.

Head of Cyber Security

Team Quark, VIT Pune

Feb 2023 – Jun 2023
  • Implemented security measures and proactive defenses to protect software systems and sensitive data from cyber threats.
  • Developed and enforced security policies and protocols, ensuring compliance with industry best practices.
  • Led cybersecurity initiatives and trained team members, strengthening the organization's overall security posture.

Projects

YouTube RAG Bot – AI Video Q&A System

Python · LangChain · FastAPI · Qdrant · BM25 · HuggingFace · LangSmith · RAGAS · Chrome Extension

RAG pipeline over YouTube transcripts using hybrid retrieval (BM25 + dense vector search), query expansion, and cross-encoder reranking. Delivers timestamp-grounded answers via a Chrome Extension embedded in YouTube UI. Evaluated with RAGAS (Faithfulness: 0.75, Relevancy: 0.71) and traced end-to-end via LangSmith.

Retrieval
Hybrid BM25 + Semantic Search + Reranking
Evaluation
RAGAS · Faithfulness 0.75 · Relevancy 0.71
Extension
Chrome Extension with Timestamp Jump UI

AI PR Reviewer – Multi-Agent GitHub Code Review System

Python · LangGraph · LangChain · FastAPI · Google Gemini · Ollama · Pydantic · LangSmith

GitHub App that reviews pull requests using a parallel multi-agent LangGraph pipeline — security, bug risk, and performance agents run simultaneously with typed Pydantic outputs aggregated into a structured merge recommendation. Supports Ollama and Gemini 2.0 Flash with a configurable human-in-the-loop gate, LangSmith tracing, and SQLite checkpointing.

Architecture
Parallel Multi-Agent LangGraph Pipeline
Integration
GitHub App · JWT · HMAC-SHA256 Webhooks
Control
Configurable Human-in-the-Loop Gate

Resume

View my complete resume for a detailed overview of my experience, skills, and achievements.

View Resume

Articles & Blog Posts

Sharing insights on AI engineering, LLMs, and backend systems

View all articles

Academic Background

B.Tech in Computer Engineering

Vishwakarma Institute of Technology (VIT), Pune

Sep 2021 – Jun 2024

CGPA: 8.6/10

Relevant Coursework: Data Structures & Algorithms, Object-Oriented Programming, Database Management Systems, Computer Networks, Operating Systems, Artificial Intelligence, Cyber Security, Compiler Design

Publications

  • "IOT Smart Stand for Smart Phones" — Published in IEEE INCOFT (International Conference on Futuristic Technologies) 2022
  • "Prototype Design for Smart Birdnest and Related Android App" — Book Chapter 8, Interactive Media with Next-Gen Technologies and Their Usability Evaluation

Contact

Open to full-time AI engineer opportunities and interesting collaborations