About Me
I am a Research Engineer passionate about machine learning and computer vision. Currently working on cutting-edge AI solutions at Avataar.ai.
Life Updates
Our paper "TACLE" has been accepted at WACV 2025! π
Started new role as Research Engineer-1 at Avataar.ai π
Graduated from IISc Bangalore with M.Tech in AI π
Education

M.Tech in Artificial Intelligence
Indian Institute of Science (IISc), Bangalore
2022 - 2024
CGPA: 8.0/10.0

B.Tech in Electrical Engineering
Bhagalpur College of Engineering, Bhagalpur
2018 - 2021
CGPA: 8.75/10.0

Diploma in Electrical Engineering
Government Polytechnic Muzzafarpur, Muzaffarpur
2015 - 2018
Percentage: 77.73%

Secondary School (10th)
Bihar School Examination Board
Utkramit M S Parmanandpur Shraniganj
2015
Percentage: 60%
Experience

Research Engineer-1 - Avataar.ai
July 2024 - Present
- Built an end-to-end pipeline that automatically creates lifestyle images using Flux Model and ControlNets, making products look better and more realistic for customers.
- Modified the diffusion modelβs sampling procedure to improve object reconstruction, followed by intrinsic decomposition for realistic product relighting.
- Built classification systems for low-data scenarios using pre-trained models such as CLIP, BLIP2 and Qwen2.5, enhancing accuracy in product categorization.
- Enhanced segmentation accuracy by implementing BiRefNet models while also integrating SAM with YOLO-world for complex scenes.
- Improved object detection by benchmarking various frameworks including YOLO-world, Florence, and other mechanisms.

Teaching Assistant - IISc Bangalore
Subject: Signal Processing in Practice
Jan 2024 - Apr 2024
- Integrated continual learning frameworks (L2P, DualPrompt) to mitigate catastrophic forgetting in neural networks.
- Built self-supervised models using MoCo and SimCLR for robust visual representation learning.
- Developed adaptive prompt-based learning with dynamic token expansion and attention mechanisms.

Teaching Assistant - IISc Bangalore
Subject: Digital Image Processing
Aug 2023 - Dec 2023
- Developed DFT-based frequency domain filtering for advanced image denoising and enhancement.
- Implemented SIFT and Normalized Cut for precise feature detection and image segmentation.
- Optimized deep learning models using EfficientNet-B0 with custom classifiers for efficient convergence.
Projects
Cricket-Shot Predictor
Developed an LSTM-based video classification model utilizing pre-trained image embeddings from CLIP and SigLIP, along with fine-tuned VideoMAE for end-to-end classification.
- Trained LSTM using visual features extracted with CLIP and SigLIP, achieving 52% and 53% test accuracy, respectively, and saving the checkpoint with the highest validation accuracy.
- Fine-tuned VideoMAE on the cricket shots dataset, achieving 66.05% test accuracy, while saving model checkpoints after each epoch and monitoring validation performance.
- Prepared the video dataset by extracting frames, computing embeddings, and assigning labels, while implementing detailed logging and leveraging Hugging Face Hub for model sharing.
Virtual Try-On
Developed a deep learning-based Virtual Try-On pipeline integrating segmentation, garment transformation, and try-on synthesis for realistic virtual clothing visualization.
- Built an AI-powered Virtual Try-On pipeline combining Florence2 and IDM-VTON models for automated garment transfer and visualization.
- Enhanced system performance by implementing a 3-stage framework with CAT-TryOff model for improved color and pattern consistency.
- Optimized system latency by developing a single-stage solution using Any2Any-Tryon and FLUX-based architectures.
- Conducted iterative improvements through failure analysis, addressing key challenges in garment fitting and pattern preservation.
Publications
Skills
Research Interests
- Image Processing & Computer Vision
- Large Language Models (LLMs)
- Natural Language Processing (NLP)
Languages
- Python
- SQL
- MATLAB
Deep Learning Frameworks
- PyTorch
- TensorFlow
Tools
- SLURM
- AWS
- GitHub
- ComfyUI
Relevant Coursework
- Digital Image Processing
- Advanced Image Processing
- Computer Vision
- Introduction to NLP
- LLMs for Practical NLP
- Deep Learning for NLP
- Digital Video Perception and Algorithms
- Linear Algebra
- Stochastic Models and Applications
- Pattern Recognition and Neural Networks
- Game Theory
- Computational Methods of Optimization
Achievements
Entrance Exams
BCECE [LE]
Rank: 1
2018
Certifications

Agents Course


GenAI Hackathon


Programming in MATLAB


Power System Certificate

Volunteering
Contact
sahil15rohit88@gmail.com
rohit.kumar@avataar.ai