I am research scientist at Google DeepMind working on robotic manipulation. I earned my PhD at MIT working with Prof. Alberto Rodriguez.
I develop algorithms and solutions that enable robots to solve new tasks with high accuracy and dexterity.
My research was supported by
LaCaixa and Facebook fellowships.
My research focuses on developing algorithms for precise robotic generalization:
making robots capable of solving many tasks without compromising their performance and reliability.
By learning general AI models of perception and control,
we can provide robots with the right tools to thrive in diverse environments and task requirements.
In my work, I have studied how learning AI models allows precise control,
and how developing accurate visuo-tactile perception enables solving complex tasks,
such as grasping, localization, and precise placing without prior experience.
My goal is to continue developing algorithms that make robots dexterous and versatile at manipulating their environment.
ExoStart: RL + sim2real transfer for hand control.
Precise pick-and-place of objects using SimPLE.
ExoStart: Efficient learning for dexterous manipulation with sensorized exoskeleton demonstrations
Z. Si, J. Chen, E. Karagozler, A. Bronars, J. Hutchinson, T. Lampe, N. Gileadi, T. Howell, S. Saliceti, L. Barczyk, I. Correa, T. Erez, M. Shridhar, M. Martins, K. Bousmalis, N. Heess, F. Nori, M. Bauza submitted to ICRA 2026 PDF /
website
We present ExoStart, a general and scalable learning framework that leverages the power of human dexterity for robotic hand control. In particular, we obtain high-quality data by collecting direct demonstrations using an low-cost exoeskeleton, capturing the rich behaviors that humans naturally can demonstrate.
Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities
Gemini Team
Technical Report PDF /
website
Gemini 2.5 Pro is Google DeepMind's most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. Gemini 2.5 Pro is also a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content.
Gemini robotics: Bringing ai into the physical world
Gemini Robotics Team
Technical Report PDF /
website
Gemini Robotics is an advanced Vision-Language-Action (VLA) generalist model capable of directly controlling robots. Gemini Robotics executes smooth and reactive movements to tackle a wide range of complex manipulation tasks, handling unseen environments as well as following diverse, open vocabulary instructions.
Exploiting Policy Idling for Dexterous Manipulation
A. Chen, P. Brakel, A. Bronars, A. Xie, S. Huang, O. Groth, M. Bauza, et al.
IROS , 2025
PDF
We investigate how to leverage the detectability of idling behavior to inform exploration and policy improvement. Our approach, Pause-Induced Perturbations (PIP), applies perturbations at detected idling states, thus helping it to escape problematic basins of attraction.
DemoStart: Demonstration-led autocurriculum applied to sim-to-real with multi-fingered robots M. Bauza, J. Chen, V. Dalibard, N. Gileadi, et al.
ICRA 2025 PDF /
website
DemoStart is an auto-curriculum reinforcement learning method capable of learning complex manipulation behaviors on an arm equipped with a three-fingered robotic hand, from only a sparse reward and a handful of demonstrations in simulation.
simPLE: a visuotactile method learned in simulation to precisely pick, localize, regrasp, and place objects M. Bauza, T. Bronars, Y. Hou, I. Taylor, N. Chavan-Dafle, A. Rodriguez
Science Robotics , 2024
PDF /
website
We learn in simulation how to accurate pick-and-place objects with visuo-tactile perception. Our solution transfers to the real world and succefully handles diferent types of objects shapes without requiring prior experience.
Learning to learn faster from human feedback with language model predictive control
Jacky Liang, Fei Xia, Wenhao Yu, Andy Zeng, Montserrat Gonzalez Arenas, Maria Attarian, Maria Bauza, et al.
RSS, 2024
PDF /
website
Language Model Predictive Control (LMPC) is a framework that fine-tunes PaLM 2 to improve its teachability on 78 tasks across 5 robot embodiments. LMPC accelerates fast robot adaptation via in-context learning.
RoboCat: A Self-Improving Foundation Agent for Robotic Manipulation
K. Bousmalis, G. Vezzani, D. Rao, C. Devin, A. Lee, M. Bauza, et al.
TMLR, 2023
PDF /
website
We introduce a self-improving AI agent for robotics, RoboCat, that learns to perform a variety of tasks across different arms, and then self-generates new training data to improve its technique.
FingerSLAM: Closed-loop Unknown Object Localization and Reconstruction from Visuo-tactile Feedback
J. Zhao, M. Bauza, E. Adelson
under review, 2022
We address the problem of using visuo-tactile feedback for 6-DoF localization and 3D reconstruction of unknown in-hand objects.
We learn in simulation how to accurate localize objects with tactile. Our solution transfers to the real world, providing reliable pose distributions from the first touch.
Our technology is used by Magna and ABB and MERL. Our tactile sensor is Gelslim.
We optimize unsupervised losses for the current input. By optimizing where we act, we bypass generalization gaps and can impose a wide variety of inductive biases.
We learn to map functions to functions by combining graph networks and attention to build computational meshes and show this new framework can solve very diverse problems.
We propose a hybrid dynamics model, simulator-augmented interaction networks (SAIN), combining a physics engine with an object-based neural network for dynamics modeling.
We augment an analytical rigid-body simulator with a neural network that learns to model uncertainty as residuals. Best Paper Award on Cognitive Robotics at IROS 2018.