projects

Research publications, technical projects, and systems across computer vision, NLP, and software engineering.

Research & Engineering Versatility: My work spans deep learning research (computer vision, NLP, multi-modal learning) and full-stack software engineering. I move fluidly between theoretical research (designing novel architectures, first-author publications at top venues) and practical systems (building production-ready applications, winning hackathons). Whether it's implementing state-of-the-art models from scratch, optimizing CUDA kernels, or architecting scalable systems, I thrive on challenging problems across the AI/ML stack. Below you'll find both research projects (many published at CVPR, NeurIPS, ICMR) and engineering systems demonstrating end-to-end technical capability.

vision

MSCAN: Multistage Spinal Canal Stenosis Grading

CVPR 2025

IEEE/CVF CVPR 2025 Demo Track | Published Research

Deep learning framework for automated MRI-based spinal stenosis grading achieving 0.971 AUROC. Designed multi-stage architecture combining YOLO detection with multi-view cross-attention for precise stenosis classification across three severity levels. Presented at CVPR 2025 in Seattle.

PyTorch YOLO Cross-Attention Medical Imaging

arXiv Code Talk

DentalNet: Multi-View Transformer for Dental 3D Analysis

NeurIPS 2025

NeurIPS 2025 Imageomics Workshop | Published Research

Multi-modal 2D/3D fusion achieving 67% F1-score for orthodontic classification. Designed geometric-aware cross-attention mechanism integrating 2D intraoral views with 3D point cloud representations, significantly outperforming single-modality baselines that plateaued at 55%.

Vision Transformers Point Clouds Multi-Modal Fusion Medical AI

Scholar

The Frequency of a Lie: When Phase Betrays Semantics in AI-Generated Images

Under Review

CVPR 2026 Submission | Research in Progress

Multi-domain deepfake detector fusing semantic, frequency, and noise cues via Band-Conditioned Phased Cross-Attention (BC-PCA). Achieved 98% F1 vs 93.6% baseline with interpretable CLIP/DINO-based U-Net decoder. Curated 100k+ synthetic training dataset with contrastive loss for noise features.

CLIP DINO Frequency Analysis Deepfake Detection

Under Review

Few-Shot Visual Search in Satellite Imagery

Top 8/100+

Grand AI Challenge (PS-03), Government of India

Developed few-shot visual search system for satellite imagery combining frozen self-supervised ViT features with learnable adapters, unifying CNN and transformer embeddings into shared 64-D space for efficient prototype-based retrieval. Found frozen features outperform fine-tuned models. Performed ablations on adapter designs and multi-scale aggregation with per-class PR/F1 metrics.

PyTorch DINO Siamese Networks Few-Shot Learning

Ongoing Competition

Multimodal Price Prediction

Rank 54/20,000+

Amazon ML Challenge 2025

Fused CLIP ViT-B/32 and DINOv2 visual embeddings with TF-IDF text features via cross-attention for product price prediction. Implemented global embedding caches for 5x inference speedup, achieving 42.386 SMAPE without external price lookups.

CLIP DINOv2 Cross-Attention TF-IDF

Code

nlp

SocialDF: Benchmark Dataset for Deepfake Detection

ICMR 2025

ACM ICMR 2025 MAD Workshop | Published Research

Real-world benchmark with multi-agent LLM framework outperforming SOTA lip-sync methods through fact-checking integration. Designed novel evaluation protocol combining visual deepfake detection with semantic consistency verification across modalities.

LLMs Multi-Agent Systems Fact-Checking Deepfakes

ACM DL Talk

REDACT - Data Privacy Platform

Top 2/100+

Smart India Hackathon 2024 - National Finalist

Multimodal redaction achieving 99% PII detection accuracy across text, images, and documents. BERT-based NLP with few-shot learning for domain adaptation to legal and medical contexts. Deployed for medical, legal, and cybersecurity applications with secure processing pipeline.

BERT Computer Vision NLP Few-Shot Learning

Code Demo

MusicBind - Cross-Modal Song Retrieval

Research Prototype

Aligned audio, text, image, and graph modalities in shared hyperbolic embedding space using triplet loss for cross-modal song retrieval. Built PyTorch pipeline with graph neural networks, evaluating on GTZAN dataset with hyperbolic distance metrics for hierarchical music relationships.

PyTorch Graph Neural Networks Hyperbolic Embeddings Multi-Modal Learning

Code

dev

Algorithmic Trading System

Time Series Analysis, Financial ML

LSTM and Transformer models on NSE tick data with sentiment integration from news sources. Real-time execution via Kite API processing 1000+ ticks/second with latency optimization. Confidence scoring for candlestick patterns with Streamlit monitoring dashboard.

LSTM Transformers Time Series Streamlit

Under Development

Marvel Birds - Physics Engine

Java, LibGDX, Design Patterns

Physics simulation with collision detection and dynamic level generation using Box2D. Applied Factory and Observer patterns for extensible architecture and modular game mechanics.

Java LibGDX Box2D Design Patterns

Code Demo

BidBazar - Auction Platform

Concurrent Systems, SQL Optimization

Real-time bidding system handling 1000+ concurrent users with optimized database queries. Implemented auto-bidding with sniping protection and price prediction using regression models.

Python SQL Concurrent Programming Web Development

Code

RISC-V Assembler & Simulator

Systems Programming, CI/CD

RV32I instruction set implementation with memory management and label resolution. 95% test coverage with automated CI/CD pipeline and comprehensive error handling.

Python Systems Programming CI/CD Testing

Code