📧 hp.6318@gmail.com 💻 GitHub 🎓 Google Scholar 💼 LinkedIn 📱 +1 (213) 414-8703

Hardik Prajapati

PhD Student at UC Santa Barbara | Computer Vision & ML Researcher | GCP Certified

June 2026 Joining Zoox as Research Intern with the Perception team — working on VLMs and multimodal architectures for AV semantic understanding.new

May 2026 WRIVINDER accepted at CVPR 2026 — geo-localization of ground images onto satellite imagery.new

I am a Ph.D. student in Electrical and Computer Engineering at the University of California, Santa Barbara, specializing in Computer Vision and Machine Learning. I am advised by Prof. B.S. Manjunath at Vision Research Lab . My research is driven by the goal of developing sophisticated world models that enable autonomous systems to perceive, reason about, and predict the dynamics of complex visual environments.

My current work focuses on bridging the gap between high-level human intent and dense, structured scene understanding. By integrating interactive foundation models with relational reasoning, I build frameworks that transform sparse visual cues into pixel-accurate world-state representations. I am particularly interested in:

Spatiotemporal Grounding: Developing systems like Click2Graph that allow users to guide scene graph generation through direct visual prompting.
Consistent World Dynamics: Designing architectures that maintain temporal coherence in dynamic environments, ensuring a non-fragmented understanding of how entities and their interactions evolve over time.
Predictive Generative Modeling: Advancing long-range video generation through explicit future-frame prediction to simulate and anticipate complex scene evolution.

Prior to my doctoral studies, I earned my Master of Science from the University of Southern California and gained industry experience at Mayachitra Inc. and Analytos. There, I built scalable ML pipelines and multi-modal fusion systems for real-world applications. I am passionate about pushing the boundaries of spatial intelligence and temporal consistency to create the next generation of predictive AI.

Interests

Video Content Understanding
Video Generation
Multi-modal sensor fusion and alignment
Interactive/Promptable computer vision systems

Education

PhD in Electrical & Computer Engineering (2024 - Present)
University of California, Santa Barbara (UCSB), USA
MS in Electrical & Computer Engineering (2021 - 2022)
University of Southern California (USC), USA
GPA: 4.0/4.0
B.Tech in Instrumentation & Control (2014 - 2018)
Nirma University, India
GPA: 8.34/10.0

News

June 2026	Joining Zoox as Research Intern (Perception team)
May 2026	WRIVINDER paper accepted at CVPR 2026.
March 2026	Received Research Intern offers from Motional and Zoox.
August 2025	Joined Mayachitra Inc. as Machine Learning Software Intern.
May 2025	Passed PhD Screening Exam.
Sept 2024	Started PhD at UC Santa Barbara under advisement of Prof. B.S. Manjunath.
Jan 2024	Joined Analytos as Junior AI Engineer.
May 2023	Tiny ML Model paper published in APSIPA Transactions.
May 2023	Awarded Outstanding Academic Achievement Award (top 2 from 200+ students in the ECE department).
Jan 2023	S3I-PointHop paper accepted at IEEE ICASSP 2023.
Dec 2022	Graduated from USC with 4.0 GPA.
May 2022	Started research internship at USC Media Communications Lab under Prof. C.-C. Jay Kuo.
Jan 2022	Started as Course Mentor for Machine Learning course at USC (110 students).
Jan 2021	Started Master's at University of Southern California.

Selected Publications

2026

WRIVINDER: Towards Spatial Intelligence for Geo-locating Ground Images onto Satellite Imagery

C. Gudavalli, TM Mohammed, A Yadav, AV Bhaskar, H. Prajapati, B.S. Manjunath, et al.

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026

Wrivinder, a zero-shot, geometry-driven framework that aggregates multiple ground photographs to reconstruct a consistent 3D scene and align it with overhead satellite imagery. Wrivinder combines SfM reconstruction, 3D Gaussian Splatting, semantic grounding, and monocular depth–based metric cues to produce a stable zenith-view rendering that can be directly matched to satellite context for metrically accurate camera geo-localization. To support systematic evaluation of this task—which lacks suitable benchmarks—we also release MC-Sat, a curated dataset linking multi-view ground imagery with geo-registered satellite tiles across diverse outdoor environments. Together, Wrivinder and MC-Sat provide a first comprehensive baseline and testbed for studying geometry-centered cross-view alignment without paired supervision. In zero-shot experiments, Wrivinder achieves sub-30 m geolocation accuracy across both dense and large-area scenes, highlighting the promise of geometry-based aggregation for robust ground-to-satellite localization.

Paper Code

2026

Hyperspectral Trajectory Image for Multi-Month Trajectory Anomaly Detection

MA Rahman, C. Gudavalli, H. Prajapati, BS Manjunath

arXiv, 2026

We propose TITAnD (Trajectory Image Transformer for Anomaly Detection), which reformulates trajectory anomaly detection as a vision problem by representing trajectories as a Hyperspectral Trajectory Image(HTI): a day × time-of-day grid whose channels encode spatial, seman- tic, temporal, and kinematic information from either modality, unifying both under a single representation. Under this formulation, agent-level detection reduces to image classification and temporal localization to semantic segmentation. To model this representation, we introduce the Cyclic Factorized Transformer (CFT), which factorizes attention along the two temporal axes, encoding the cyclic inductive bias of human rou- tines, while reducing attention cost by orders of magnitude and enabling dense multi-month anomaly detection for the first time.

arXiv Code

2026

2025

Click2Graph: Interactive Panoptic Video Scene Graphs from a Single Click

H. Prajapati*, Raphael Ruschel*, Awsaf Rahman, B.S.Manjunath

arXiv, 2025

Click2Graph is the first interactive framework for Panoptic Video Scene Graph Generation (PVSG) that unifies visual prompting with spatial, temporal, and semantic understanding. From a single user cue, such as a click or bounding box, Click2Graph segments and tracks the subject across time, autonomously discovers interacting objects, and predicts ⟨subject, object, predicate⟩ triplets to form a temporally consistent scene graph

Abs arXiv Code Bib

2025

TCDSG: An End-to-End Approach for Action Tracklet Generation

Raphael Ruschel,H. Prajapati, Awsaf Rahman, B.S.Manjunath

arXiv, 2025

TCDSG is a unified end-to-end framework that integrates detection, tracking, and interaction prediction across video sequences. TCDSG introduces two key innovations: a sequence-level bipartite matching strategy that enforces stable query assignments across frames to reduce tracklet fragmentation without post-processing, and temporally conditioned decoder queries that inject inter-frame feedback directly into decoding for improved stability and accuracy. Together, these mechanisms yield tR@50 39.1% on Action Genome.

Abs arXiv Code Bib

2025

2023

S3I-PointHop: SO(3)-Invariant PointHop for 3D Point Cloud Classification

P. Kadam, H. Prajapati, et al.

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

A rotation-invariant approach for 3D point cloud classification that assigns class labels to point cloud scans expressed in arbitrary coordinate systems. The method derives invariant representations by leveraging principal components, rotation invariant local/global features, and point-based eigen features, enabling robust classification regardless of coordinate system orientation.

Abs Paper Code Bib

2023

Projects

Semantic Segmentation - KITTI Dataset

Implemented state-of-the-art semantic segmentation for autonomous driving scenes using DeepLab architecture. Achieved real-time performance on KITTI benchmark with optimized inference pipeline.

PyTorch • DeepLab • Computer Vision

Code Report

3D Point Cloud Classification

Developed lightweight models for efficient 3D point cloud object recognition on ModelNet40. Focus on edge deployment with minimal computational requirements.

PyTorch • Point Cloud • Edge AI

Code Demo

Object Detection - PascalVOC

Comparative study of object detection architectures including YOLO and Faster R-CNN. Detailed ablation studies on backbone architectures and optimization strategies.

PyTorch • YOLO • Faster R-CNN

Code Report

Image Classification - STL10

Deep learning-based image classification on STL10 dataset with various CNN architectures. Explored data augmentation and transfer learning techniques.

TensorFlow • CNNs • Transfer Learning

Code

Auto-Wrapper Changeover System

Industrial automation project at Unilever reducing changeover time by 40%. PLC-based system with real-time monitoring and predictive maintenance.

PLC • Industrial Automation • IoT

Details

Sales Forecasting Pipeline

End-to-end ML pipeline for sales forecasting improving decision-making by 20%. Deployed on GCP using VertexAI with automated retraining.

GCP • VertexAI • Time Series

Details

Experience

2026	Research Intern, Zoox (Perception Team) Applying large vision-language models, multimodal transformers, and audio-visual architectures to expand the semantic understanding of Zoox's perception stack.
2025	Machine Learning Software Intern, Mayachitra Architected a world model integrating multi-modal data fusion (vision, text, and telemetry) for predictive situational awareness and real-time semantic retrieval.
2024 - Present	PhD Candidate, UC Santa Barbara Research on video scene graphs and trajectory anomaly detection under Prof. B.S. Manjunath
2024	Junior AI Engineer, Analytos Developed sales forecasting and vehicle routing optimization systems
2022	Graduate Researcher, USC Media Communications Lab Designed lightweight 3D point cloud classification models under Prof. C.-C. Jay Kuo
2021 - 2022	Course Mentor - Machine Learning, University of Southern California Coached 110 students, designed homework and exam problems
2019 - 2020	Engineer, G6 SuperHomes Designed smart home automation prototype with voice and mobile control
2018 - 2019	Manufacturing Engineer, Unilever India Supervised 43 employees, improved efficiency through automation