|
Amish Sethi
I'm an undergraduate student at the University of Pennsylvania, currently in my senior year pursuing a Bachelor's and Master's degree in Computer Science, with an expected graduation for both in 2026.
I work with Professor Mayur Naik on projects that push the boundaries of deep learning and neurosymbolic AI. My interests span neural-network optimization, scalable model compression, integrating symbolic reasoning with neural architectures, and building foundation models for video understanding. I enjoy building efficient, interpretable systems that extend the capabilities of large language and vision models.
I'm incredibly grateful to Professor Mayur Naik for his continuous support and guidance throughout my research journey. I also deeply appreciate the mentorship and inspiration from the PhD students I've worked closely with:
Neelay Velingker,
Oscar Xu,
Aaditya Naik, and
Jiani Huang.
Email /
CV /
Scholar /
Github
|
|
Research
I'm interested in deep learning, generative AI, and neurosymbolic AI. Most of my research focuses on optimizing large language models through efficient finetuning, quantization, and pruning, exploring how symbolic reasoning can be integrated into neural networks for greater interpretability and control, and building foundation models for video understanding to enable trustworthy embodied AI systems.
|
|
Delta Activations: A Representation for Finetuned Large Language Models
Zhiqiu Xu*, Amish Sethi*, Mayur Naik, Ser-Nam Lim
Under review
arXiv /
website
Delta Activations represents finetuned models by measuring shifts in their internal activations relative to a base model. This approach clusters models by domain and enables retrieval using only 20 examples. We finetuned and released over 700 open-source models on Hugging Face, demonstrating the utility of this representation for model selection and merging in building reliable model ecosystems.
|
|
ESCA: Contextualizing Embodied Agents via Scene-Graph Generation
Jiani Huang*, Amish Sethi**, Matthew Kuo*, Mayank Keoliya, Neelay Velingker, JungHo Jung, Ziyang Li, Ser-Nam Lim, Mayur Naik
NeurIPS 2025 Spotlight (top 3%)
paper /
website
ESCA addresses that up to 69% of embodied AI failures stem from perception errors. Using VINE, a foundation model that extracts spatio-temporal scene graphs from video, ESCA provides explicit spatial context for vision-language models. Our approach improved success rates by up to 10%, spatial reasoning by 14.6%, and reduced perception errors from 69% to 30% on EmbodiedBench without requiring model retraining.
|
|
VINE: A Foundation Model for Video Understanding
Amish Sethi*, Jiani Huang*, Matthew Kuo*, Ziyang Li, Mayank Keoliya, Neelay Velingker, Mayur Naik, Sernam Lim
Foundation Model
website
code /
model /
dataset
VINE is a foundation model that transforms video into structured scene graphs capturing entities, attributes, spatial relationships, and temporal dynamics. Given a video and optional keywords, VINE outputs probabilistic scene graphs that provide rich semantic structure beyond object detection. Trained on 87K+ videos using neurosymbolic learning, VINE is both promptable and fine-tuneable for diverse applications from contextualizing vision-language models to enabling robot learning from demonstration.
|
|
Dolphin: A Programmable Framework for Scalable Neurosymbolic Learning
Aaditya Naik, Jason Liu, Claire Wang, Amish Sethi, Saikat Dutta, Mayur Naik, Eric Wong
ICML 2025
Publication
DOLPHIN is a novel framework combining symbolic reasoning and neural computation using CPU-GPU hybrid execution. Its execution of vectorized probabilistic computations on the GPU allows it to achieve up to 62× faster convergence than baselines across 13 benchmarks spanning text, image, and video modalities.
|
|
CLAM: Unifying Finetuning, Quantization, and Pruning by Chaining LLM Adapter Modules
Neelay Velingker, Amish Sethi*, Jason Liu*, William Dodds*, Zhiqiu Xu, Saikat Dutta, Mayur Naik, Eric Wong
Workshop on Efficient Systems for Foundation Models II @ ICML 2024
paper /
code
CLAM is a framework unifying parameter-efficient finetuning, quantization, and pruning for LLMs. It enables chaining of adapters with low overhead and high modularity, outperforming state-of-the-art methods by up to 6.5%. CLAM achieves superior trade-offs in compression and downstream performance, beating QLoRA while effectively halving the number of active bits.
|
|
Functional Genetic Biomarkers of Alzheimer's Disease and Gene Expression from Peripheral Blood
Amish Sethi*, Andrew Ni*
International Science and Engineering Fair 2021
paper
This project utilized machine learning, clustering, and dimensionality reduction algorithms in scikit-learn to identify which genes are expressed differently between those with Alzheimer's and a control group. A model trained on this gene expression data could predict likelihood of Alzheimer's with 98% accuracy.
Cited over 8 times and viewed over 1,000 times on biorxiv.
|
|
* Equal contribution
|
Teaching and Mentorship
In the Fall of 2024, I served as the Head Teaching Assistant (TA) for CIS 7000: Large Language Models, the University of Pennsylvania's first dedicated course on LLMs.
The course enrolled over 120 students and covered the theory, design, training, compression, deployment, and application of large language models.
As Head TA, I was responsible for:
- Planning the course curriculum
- Designing and implementing homework assignments
- Holding office hours and supporting students throughout the semester
- Creating several lecture slide decks
- Delivering lectures on efficient finetuning, adaptation, and evaluation
The course received a TA quality rating of 3.15 and an overall course quality rating of 3.01 out of 4.
In the Summer of 2024, I mentored five undergraduate students through the
Penn Undergraduate Research Mentoring Program (PURM)
on the CLAM project, focusing on efficient finetuning, quantization, and pruning.
I taught these students how to conduct research in machine learning, work with LLMs, and develop scalable optimization frameworks. The students I mentored were:
|
|