projects
Research publications, technical projects, and systems across computer vision, NLP, and software engineering.
Research & Engineering Versatility: My work spans deep learning research (computer vision, NLP, multi-modal learning) and full-stack software engineering. I move fluidly between theoretical research (designing novel architectures, first-author publications at top venues) and practical systems (building production-ready applications, winning hackathons). Whether it's implementing state-of-the-art models from scratch, optimizing CUDA kernels, or architecting scalable systems, I thrive on challenging problems across the AI/ML stack. Below you'll find both research projects (many published at CVPR, NeurIPS, ICMR) and engineering systems demonstrating end-to-end technical capability.
vision
MSCAN: Multistage Spinal Canal Stenosis Grading
CVPR 2025Deep learning framework for automated MRI-based spinal stenosis grading achieving 0.971 AUROC. Designed multi-stage architecture combining YOLO detection with multi-view cross-attention for precise stenosis classification across three severity levels. Presented at CVPR 2025 in Seattle.
DentalNet: Multi-View Transformer for Dental 3D Analysis
NeurIPS 2025Multi-modal 2D/3D fusion achieving 67% F1-score for orthodontic classification. Designed geometric-aware cross-attention mechanism integrating 2D intraoral views with 3D point cloud representations, significantly outperforming single-modality baselines that plateaued at 55%.
FreqDINO: Multi-Modal Deepfake Detection
Under ReviewMulti-domain deepfake detector fusing semantic, frequency, and noise cues via Band-Conditioned Phased Cross-Attention (BC-PCA). Achieved 98% F1 vs 93.6% baseline with interpretable CLIP/DINO-based U-Net decoder. Curated 100k+ synthetic training dataset with contrastive loss for noise features.
Few-Shot Visual Search in Satellite Imagery
Top 8/100+Developed few-shot visual search system for satellite imagery combining frozen self-supervised ViT features with learnable adapters, unifying CNN and transformer embeddings into shared 64-D space for efficient prototype-based retrieval. Found frozen features outperform fine-tuned models. Performed ablations on adapter designs and multi-scale aggregation with per-class PR/F1 metrics.
Multimodal Price Prediction
Rank 54/20,000+Fused CLIP ViT-B/32 and DINOv2 visual embeddings with TF-IDF text features via cross-attention for product price prediction. Implemented global embedding caches for 5x inference speedup, achieving 42.386 SMAPE without external price lookups.
nlp
SocialDF: Benchmark Dataset for Deepfake Detection
ICMR 2025Real-world benchmark with multi-agent LLM framework outperforming SOTA lip-sync methods through fact-checking integration. Designed novel evaluation protocol combining visual deepfake detection with semantic consistency verification across modalities.
REDACT - Data Privacy Platform
Top 2/100+Multimodal redaction achieving 99% PII detection accuracy across text, images, and documents. BERT-based NLP with few-shot learning for domain adaptation to legal and medical contexts. Deployed for medical, legal, and cybersecurity applications with secure processing pipeline.
MusicBind - Cross-Modal Song Retrieval
Aligned audio, text, image, and graph modalities in shared hyperbolic embedding space using triplet loss for cross-modal song retrieval. Built PyTorch pipeline with graph neural networks, evaluating on GTZAN dataset with hyperbolic distance metrics for hierarchical music relationships.
dev
Algorithmic Trading System
LSTM and Transformer models on NSE tick data with sentiment integration from news sources. Real-time execution via Kite API processing 1000+ ticks/second with latency optimization. Confidence scoring for candlestick patterns with Streamlit monitoring dashboard.
Marvel Birds - Physics Engine
Physics simulation with collision detection and dynamic level generation using Box2D. Applied Factory and Observer patterns for extensible architecture and modular game mechanics.
BidBazar - Auction Platform
Real-time bidding system handling 1000+ concurrent users with optimized database queries. Implemented auto-bidding with sniping protection and price prediction using regression models.
RISC-V Assembler & Simulator
RV32I instruction set implementation with memory management and label resolution. 95% test coverage with automated CI/CD pipeline and comprehensive error handling.