I'm a Machine Learning Research Intern at Causal Labs and a Computer Science graduate from NYU. Previously, I was a Graduate Researcher under Prof. Yann LeCun at CILVR Lab, researching Hierarchical Planning with Latent World Models. I also worked on collision-aware navigation using Depth Barrier Regularization at AI4CE Lab under Prof. Chen Feng. Active open-source contributor to MLX, Apple's ML framework for Apple silicon.
My research focuses on Representation Learning, World Modeling, and Planning using state-of-the-art deep learning techniques. Current work includes Planning with Latent Dynamics Models, Hierarchical JEPA architectures, collision-aware navigation with Depth Barrier Regularization, and multi-modal AI systems. Tech stack: PyTorch, TensorFlow, Computer Vision (YOLO, DINOv2, VLAD-BuFF), Diffusion Models, ROS.
Hierarchical MPC in a shared latent space that enables zero-shot non-greedy planning from images. A high-level planner optimizes macro-actions using a long-horizon world model to generate subgoals, while a low-level planner executes primitive actions to reach each subgoal. Achieves 70% success on real-robot pick-and-place from a single goal image (vs. 0% for flat planners), with up to +44% absolute success rate gains and 3x lower planning cost on long-horizon tasks. Model-agnostic abstraction that consistently improves diverse latent world models (VJEPA2-AC, DINO-WM, PLDM).
Extending PLDM to solve complex tasks by training a hierarchical JEPA world model. Implemented Transformer with Markov assumption as predictor, trained using teacher forcing with VICReg regularization to prevent representation collapse. Optimal control (MPC) planning on the latent dynamics model shows best generalization to new tasks. Collaborating with PhD candidate Kevin Zhang and PhD Vlad Sobal at CILVR Lab.
Building on CityWalker for collision-aware urban navigation using depth barrier regularization. Developing differentiable safety constraints from monocular depth to prevent steering into obstacles while preserving data-driven policy performance. Co-mentored by postdoc Jing Zhang and PhD candidate Xinhao Liu at AI4CE Lab.
Developed Action-aware Representation Alignment (AC-REPA) for action-conditioned diffusion world models that simulate egocentric futures for navigation planning. Aligned internal denoising representations of Conditional Diffusion Transformers (CDiT) to frozen video foundation encoders (VideoMAE-v2) with action conditioning. Combined feature alignment with action-gated spatio-temporal relation distillation (AC-TRD) to improve temporal coherence, reduce artifacts, and enhance planning success in navigation environments.
Engineered DPO loss incorporating cross attention penalties to reduce hallucinations in multi image QA for Large Vision Language Models (LVLMs), improving target image focus by 33.93% (vs. 29.43% baseline). Performed inference time optimization by confidence based attention scaling, boosting accuracy by 10%. Trained on LLaVA665k augmented datasets using LoRA fine-tuning.
Projects
Production AI systems deployed to solve real-world problems with measurable impact. These projects are actively used by professionals in their daily workflows, delivering significant productivity improvements and enabling new capabilities through practical applications of deep learning and computer vision.
Active contributor to MLX, Apple's array framework for machine learning on Apple silicon. MLX provides a NumPy-like Python API with composable function transformations, lazy computation, and unified memory model. Contributing to the development of efficient ML primitives and optimization techniques for deploying models on Apple devices, enabling researchers and developers to leverage hardware-accelerated machine learning capabilities.
Developed AI-powered fashion design system by fine-tuning SDXL diffusion model using LoRA and VAE to generate textile patterns inspired from base images. Leveraged Meta's Segment Anything Model (SAM) for precise image layer extraction, enabling diverse textile patch creation. Deployed production system serving 20+ daily active users (DAU are sketch artists), boosting their productivity by over 75% at the industry level.
Professional Experience
My professional experience spans research internships, industry engineering roles, and research leadership positions across diverse domains. From causal intelligence research at Causal Labs to IoT voice recognition at Whitelion, production ML systems at Jaipur Robotics, and autonomous navigation leadership at NYU. I've consistently delivered solutions that improve performance metrics and reduce manual work through AI innovation.
Led the team to develop autonomous navigation with world models as visual inertial state estimators for localization and controls. Built visual place recognition using YOLO, DINOv2 and VLAD-BuFF to improve image matching by 30%. Participated at the joint competition by NASA & Johns Hopkins University's Lunar Autonomy Challenge.
Performed image classification and segmentation by fine-tuning YOLO and CLIP for downstream tasks. Used Segment Anything and LabelStudio to create datasets resulting in reduction of human work by 60%. Created resilient backend architectures by containerizing applications with Docker and Google Cloud Platform using Flask, Streamlit, and cloud services for production ML deployments.
Designed and developed software modules for voice recognition in smart switch systems, improving recognition accuracy from 78% to 89%. Integrated NLP capabilities using spaCy to enhance user-device interaction through natural language commands. Built and optimized data processing pipelines for motion and TV sensor data using TensorFlow and scikit-learn, contributing to intelligent energy management features. Delivered robust real-time functionality in embedded environments through system integration and performance optimization.