I build generative and agentic machine-learning methods for science at every scale — from particle collisions at the Large Hadron Collider and the design of new molecules, to how drugs move through the human body and the large-scale structure of the cosmos.
I'm a research scientist trained in theoretical particle physics, now working at the intersection of machine learning and the natural sciences. My work follows a single thread applied at wildly different scales: building AI that respects the structure of scientific data — its symmetries, its discrete and continuous degrees of freedom, and the rules of the underlying domain.
That thread runs from the subatomic to the cosmological. At the smallest scales, I build generative and foundation models for particle collisions at the Large Hadron Collider (LHC). At the molecular scale, agentic methods for designing new molecules and materials. At the human scale, foundation models for pharmacokinetics — how a drug moves through the body. And at the cosmological scale, generative models for the large-scale structure of the universe.
The common thread is using AI to make scientific simulation, discovery, and design faster and more trustworthy. Currently based at Rutgers University (NHETC, Department of Physics & Astronomy).
Most ML for molecular design dreams up promising structures and leaves the hard question — can you actually make this? — for later. This work flips that around. Instead of searching over molecular graphs, the search population is made of executable synthetic routes: chains of real reactions built from purchasable building blocks, validated and scored by deterministic chemistry tools.
A large language model sits on top as a strategy controller, setting high-level preferences — how long a route to attempt, which reaction families and motifs to favour, how aggressively to explore — while the heavy lifting of building, checking, and scoring routes stays in trusted code. That split is the key idea: the model gets to guide the search without being able to hallucinate a reaction that doesn't exist. The result is candidate molecules that arrive with a feasible recipe attached.
Read the full preprint →Generative models for high-energy physics are multiplying fast, but they're hard to compare — every paper brings its own dataset, metrics, and preprocessing. Collider-Bench provides a shared suite of datasets, evaluation metrics, and baselines so that new generative approaches for collider physics can be measured on equal footing and progress becomes legible.
View on GitHub →Pharmacokinetics — how a compound is absorbed, distributed, metabolised, and excreted — largely decides whether a promising molecule can become a usable drug. This work trains a foundation model on large-scale pharmacokinetic data, learning transferable representations of how compounds behave in the body that can be fine-tuned for prediction tasks across drug-discovery pipelines.
Read more →Full publication list → Google Scholar · INSPIRE-HEP