AI systems, research, and applied ML

I work across production-grade RAG, multi-agentic systems, and deep learning models for various imaging probelems.

Publications

Dual-layer scene graph paper figure

Zero-Shot Vision Language Reasoning via Dual-layer Scene Graph Chain of Thoughts

Yash Bansal et al. - AAAI 2026 Student Abstract

Introduces a scene-graph-first reasoning pipeline that makes VLM answers more structured by separating object relationships from higher-level chain-of-thought reasoning.

Vision-language reasoningScene graphsZero-shot
TWiST paper figure

TWiST: Temporal Weakly-Supervised Triplets Recognition in Surgical Videos

Pranshu Danani, Yash Bansal et al. - AAAI 2026 Student Abstract

Targets tool-tissue-action triplet recognition in surgical video by using temporal weak supervision, reducing dependence on dense frame-level labels.

Surgical AITemporal learningWeak supervision
Morphology feature fusion paper figure

Weighted ColorMorphology Feature Fusion for Tuberculosis Bacilli Detection from Cytopathology Images

Yash Bansal et al. - ICVGIP 2025

Builds a medical-imaging classifier for TB bacilli using weighted color and morphology channels, improving sensitivity on noisy cytopathology smears.

Medical imagingCytopathologyICVGIP

Experience

Machine Learning Intern - Whyminds.ai

Built a CrewAI-based multi-agent pipeline for LinkedIn publishing: AI-news scraping, NER topic tagging, LLM drafting, activity logging, and scheduled content operations for 5+ posts per week.

Jun 2025 - Jul 2025

Also shipped an OCR extraction pipeline with PaddleOCR and LayoutLMv3 for noisy PAN and tax documents, using regex normalization and confidence thresholds to clean field-level outputs.

CrewAIPaddleOCRLayoutLMv3NER

Medical Imaging Research - Parimal Lab

Worked on cytopathology image analysis for acid-fast bacilli detection, designing a color-morphology feature fusion model around illumination, saturation, hue, and Sobel edge channels.

ICVGIP 2025

This line of work became the ICVGIP paper on TB bacilli detection, with channel attention, reconstruction-loss weighting, focal loss for false positives, and Grad-CAM interpretability.

Medical imagingOpenCVPyTorchGrad-CAM

Projects

Remote sensing

Satellite-Driven Real Estate Valuation

Code

Multimodal price-prediction system fusing satellite imagery through EfficientNet-V2-S, FPN-Lite, and CBAM with tabular features through an MLP, then combining both streams with late fusion.

R2 0.8182RMSE $143KPyTorch
E-commerce ML

Amazon ML Challenge: Product Entity Extraction

Code

Multimodal extraction pipeline for product attributes such as dimensions, quantities, and units from noisy listings and images, combining OCR cleanup with entity normalization for leaderboard-style evaluation.

OCREntity extractionNormalization

Skills

A focused stack for building systems that can be evaluated, deployed, and debugged.

LLM systems and agents

LangGraph, LangChain, CrewAI, LiteLLM, OpenAI API, vLLM, Groq, mem0.

Retrieval and evaluation

FAISS, LanceDB, Qdrant, Pinecone, SQLite FTS5, RAGAS, LLM-as-judge harnesses.

ML and computer vision

PyTorch, Hugging Face Transformers, scikit-learn, OpenCV, spaCy, ONNX Runtime.

MLOps and data

Docker, FastAPI, AWS, GCP, MLflow, W&B, Celery, Redis, Prometheus, MySQL, MongoDB.

Education

Academic base and core training.

Indian Institute of Technology, Roorkee

B.Tech. in Biosciences and Bioengineering, CGPA 8.27. Coursework and projects bridge biological systems, computer vision, and production ML engineering.