Harsh Sutaria

I'm a Computer Science graduate student at NYU and Graduate Researcher under Prof. Yann LeCun at CILVR Lab, researching Planning with Latent Dynamics Models and Hierarchical JEPA World Models. I also work on collision-aware navigation using Depth Barrier Regularization at AI4CE Lab under Prof. Chen Feng. Active open-source contributor to MLX, Apple's ML framework for Apple silicon.

Achievements: 3rd place in NYU Deep Learning Challenge (Prof. LeCun) • NASA-JHU Lunar Autonomy Challenge participant
Experience: MLX Open Source Contributor • Research Intern at Samsung PRISM • Machine Learning Engineer at Jaipur Robotics • Tech Lead at NYU Self-Drive
Impact: Sub-5ms AR latency with PointNet++ optimization • 11% improvement in voice recognition (78% → 89%) • 30% improvement in visual place recognition • 75% productivity boost for 20+ users

Open to ML Engineering, Computer Vision, Robotics, Autonomous Systems as well as Software Development Engineer (SDE) roles. Seeking Spring 2026 internships and full-time opportunities.

Email / CV / LinkedIn / Github / LeetCode

Research

My research focuses on Representation Learning, World Modeling, and Planning using state-of-the-art deep learning techniques. Current work includes Planning with Latent Dynamics Models, Hierarchical JEPA architectures, collision-aware navigation with Depth Barrier Regularization, and multi-modal AI systems. Tech stack: PyTorch, TensorFlow, Computer Vision (YOLO, DINOv2, VLAD-BuFF), Diffusion Models, ROS.

	Planning with Latent Dynamics Models Advised by Prof. Yann LeCun Graduate Research, Jan 2025 - Present demo video Extending PLDM to solve complex tasks by training a hierarchical JEPA world model. Implemented Transformer with Markov assumption as predictor, trained using teacher forcing with VICReg regularization to prevent representation collapse. Optimal control (MPC) planning on the latent dynamics model shows best generalization to new tasks. Collaborating with PhD candidate Kevin Zhang and PhD Vlad Sobal at CILVR Lab.
	Depth Barrier Regularization for Collision Aware Egocentric Navigation Advised by Prof. Chen Feng Graduate Research, Sep 2025 - Present CityWalker project Building on CityWalker for collision-aware urban navigation using depth barrier regularization. Developing differentiable safety constraints from monocular depth to prevent steering into obstacles while preserving data-driven policy performance. Co-mentored by postdoc Jing Zhang and PhD candidate Xinhao Liu at AI4CE Lab.
	Action-Aware REPA for Diffusion Based World Models Advised by Prof. Saining Xie Research Project, Oct 2024 Collaborators: Shaswat Patel, Soham Chitnis Developed Action-aware Representation Alignment (AC-REPA) for action-conditioned diffusion world models that simulate egocentric futures for navigation planning. Aligned internal denoising representations of Conditional Diffusion Transformers (CDiT) to frozen video foundation encoders (VideoMAE-v2) with action conditioning. Combined feature alignment with action-gated spatio-temporal relation distillation (AC-TRD) to improve temporal coherence, reduce artifacts, and enhance planning success in navigation environments.
	Attention-Aware DPO for Reducing Hallucinations in Multi-Image QA Advised by Prof. Saining Xie Research Project, Nov 2024 report Engineered DPO loss incorporating cross attention penalties to reduce hallucinations in multi image QA for Large Vision Language Models (LVLMs), improving target image focus by 33.93% (vs. 29.43% baseline). Performed inference time optimization by confidence based attention scaling, boosting accuracy by 10%. Trained on LLaVA665k augmented datasets using LoRA fine-tuning.

Projects

Production AI systems deployed to solve real-world problems with measurable impact. These projects are actively used by professionals in their daily workflows, delivering significant productivity improvements and enabling new capabilities through practical applications of deep learning and computer vision.

MLX - Active Open Source Contributor
Open Source Contribution, 2025 - Present
GitHub repository

Active contributor to MLX, Apple's array framework for machine learning on Apple silicon. MLX provides a NumPy-like Python API with composable function transformations, lazy computation, and unified memory model. Contributing to the development of efficient ML primitives and optimization techniques for deploying models on Apple devices, enabling researchers and developers to leverage hardware-accelerated machine learning capabilities.

Generative Fashion Design Agent
20+ daily active users (DAU), Jan 2024
demo video

Developed AI-powered fashion design system by fine-tuning SDXL diffusion model using LoRA and VAE to generate textile patterns inspired from base images. Leveraged Meta's Segment Anything Model (SAM) for precise image layer extraction, enabling diverse textile patch creation. Deployed production system serving 20+ daily active users (DAU are sketch artists), boosting their productivity by over 75% at the industry level.

Professional Experience

My professional experience spans research internships, industry engineering roles, and research leadership positions across diverse domains. From real-time AR systems at Samsung PRISM to IoT voice recognition at Whitelion, production ML systems at Jaipur Robotics, and autonomous navigation leadership at NYU. I've consistently delivered solutions that improve performance metrics and reduce manual work through AI innovation.

	Student Leader - NYU Self Drive \| AI4CE Lab Vertically Integrated Project, Sep 2024 - Present project page Led the team to develop autonomous navigation with world models as visual inertial state estimators for localization and controls. Built visual place recognition using YOLO, DINOv2 and VLAD-BuFF to improve image matching by 30%. Participated at the joint competition by NASA & Johns Hopkins University's Lunar Autonomy Challenge.
	Machine Learning Engineer - Jaipur Robotics Sagl Full-time Role, Apr 2023 - Jul 2024 company info Performed image classification and segmentation by fine-tuning YOLO and CLIP for downstream tasks. Used Segment Anything and LabelStudio to create datasets resulting in reduction of human work by 60%. Created resilient backend architectures by containerizing applications with Docker and Google Cloud Platform using Flask, Streamlit, and cloud services for production ML deployments.
	Research Intern - Samsung PRISM Research Internship, Jul 2023 - Dec 2023 Project: Real-Time Multi-Stream Synchronization \| Mentors: Prasenjit Chakraborty, Umadevi K.S (VIT Vellore) Developed real-time augmented reality (AR) systems by fusing 3D sensor data with optimized PointNet++ and lightweight 3D CNNs achieving sub-5ms latency. Implemented seamless sensor data fusion using CRNNs and multi-task learning for swift AR interactions with GPU/TPU acceleration. Built production-ready AR pipeline for instant multi-stream synchronization in mobile environments.
	Software Engineer Intern - Whitelion Software Engineering Internship, May 2022 - Aug 2022 company info Designed and developed software modules for voice recognition in smart switch systems, improving recognition accuracy from 78% to 89%. Integrated NLP capabilities using spaCy to enhance user-device interaction through natural language commands. Built and optimized data processing pipelines for motion and TV sensor data using TensorFlow and scikit-learn, contributing to intelligent energy management features. Delivered robust real-time functionality in embedded environments through system integration and performance optimization.

Template adapted from Jon Barron's website