Rohan Paleja - Personal Academic Webpage

I will join Purdue as an Assistant Professor in the Department of Computer Science in Fall 2025!

Until Fall 2025, I will continue in my role as a Technical Staff Researcher in the Artificial Intelligence Technology Group at MIT Lincoln Laboratory.

Across my research, I focus on developing novel machine-learning architectures and algorithms to support robot learning and human-robot collaboration in the diverse and unstructured environments that will be encountered by these agents in the real world. My research has received a Best Paper Finalist Award at the Conference of Robot Learning (CoRL) and a Best Workshop Paper Award at the International Conference of Computer Vision (ICCV) Multi-Agent Relational Reasoning Workshop.

Previously, I completed my Ph.D. in Robotics at the Georgia Institute of Technology, advised by Matthew Gombolay. Prior to that, I received my M.Sc. in Mechanical Engineering at Rutgers University in 2018, and a B.Sc. in Mechanical Engineering with a concentration in Aerospace Engineering at Rutgers University in 2017.

Our lab has multiple openings. See instructions on applying below.

Strategies for Collaboration, Autonomy, Learning, and Exploration in Robotics Lab (SCALE Robotics Lab)

Our lab focuses on advancing machine learning and artificial intelligence to improve robot learning, human-robot interaction, and multi-agent collaboration.

Interactive Robot Learning and Algorithmic Human-Robot Interaction: Developing new computational approaches to help humans interact with robots and teach new behaviors or correct existing behaviors.
Explainable Robotics: Imbibing intelligent robotic systems with decision-making capabilities and behaviors that can be understood, traced, and trusted by humans, especially in critical settings like healthcare, defense, and disaster response.
Multi-Agent Coordination and Information Sharing: Developing methods that enable teams of robots and humans to communicate, collaborate, and make decisions together in complex, dynamic environments.

Openings

I'm looking for technically strong and self-motivated students (at all levels) to join my lab. For PhD students with an application deadline in December 2025, please apply to Purdue CS, and mention my name in your application (you are welcome to shoot me an email early at rpaleja {at} purdue.edu). For other roles, please reach out directly and email your CV.

Admitted Purdue PhD Students If you have already been admitted and are looking for an advisor, please email me directly. I am able to directly advise students in the CS department and can co-advise students in other departments.
Masters/Undergraduate students: If you are interested in working with me, please send me an email with your CV and a brief description of your interests, and include [Prospective Masters/Undergraduate Student] in the subject respectively. I am looking for students who are passionate about robotics and AI, and who are eager to learn and contribute to our research projects.

Publications

* denotes equal contribution. Blue - Conference. Orange - Journal. Pink - Workshop/Other.

Generalized Behavior Learning from Diverse Demonstrations

Varshith Sreeramdass , Rohan Paleja , Letian Chen , Sanne van Waveren , Matthew Gombolay

ICLR 2025 International Conference on Learning Representations (ICLR), 2025.

Abstract PDF

Diverse behavior policies are valuable in domains requiring quick test-time adaptation or personalized human-robot interaction. Human demonstrations provide rich information regarding task objectives and factors that govern individual behavior variations, which can be used to characterize \it{useful} diversity and learn diverse performant policies. However, we show that prior work that builds naive representations of demonstration heterogeneity fails in generating successful novel behaviors that generalize over behavior factors. We propose Guided Strategy Discovery (GSD), which introduces a novel diversity formulation based on a learned task-relevance measure that prioritizes behaviors exploring modeled latent factors. We empirically validate across three continuous control benchmarks for generalizing to in-distribution (interpolation) and out-of-distribution (extrapolation) factors that GSD outperforms baselines in novel behavior discovery by 21%. Finally, we demonstrate that GSD can generalize striking behaviors for table tennis in a virtual testbed while leveraging human demonstrations collected in the real world.

Heterogeneous Policy Networks for Composite Robot Team Communication and Coordination

Esmaeil Seraj * , Rohan Paleja * , Luis Pimentel , Kin Man Lee , Zheyuan Wang , Daniel Martin , Matthew Sklar , John Zhang , Zahi Kakish , Matthew Gombolay

T-Ro 2024 IEEE Transaction on Robotics, Volume 40, pages 3833 - 3849

Abstract PDF

High-performing human–human teams learn intelligent and efficient communication and coordination strategies to maximize their joint utility. These teams implicitly understand the different roles of heterogeneous team members and adapt their communication protocols accordingly. Multiagent reinforcement learning (MARL) has attempted to develop computational methods for synthesizing such joint coordination–communication strategies, but emulating heterogeneous communication patterns across agents with different state, action, and observation spaces has remained a challenge. Without properly modeling agent heterogeneity, as in prior MARL work that leverages homogeneous graph networks, communication becomes less helpful and can even deteriorate the team’s performance. In the past, we proposed heterogeneous policy networks (HetNet) to learn efficient and diverse communication models for coordinating cooperative heterogeneous teams. In this extended work, we extend HetNet to support scaling heterogeneous robot teams. Building on heterogeneous graph-attention networks, we show that HetNet not only facilitates learning heterogeneous collaborative policies, but also enables end-to-end training for learning highly efficient binarized messaging. Our empirical evaluation shows that HetNet sets a new state-of-the-art in learning coordination and communication strategies for heterogeneous multiagent teams by achieving an 5.84% to 707.65% performance improvement over the next-best baseline across multiple domains while simultaneously achieving a 200× reduction in the required communication bandwidth.

STL: Still Tricky Logic (for System Validation, Even When Showing Your Work)

Isabelle Hurley , Rohan Paleja , Ashley Suh , Jaime D. Peña , Ho Chit Siu

NeurIPS 2024 Conference on Neural Information Processing Systems (NeurIPS), 2024.

Abstract PDF

As learned control policies become increasingly common in autonomous systems, there is increasing need to ensure that they are interpretable and can be checked by human stakeholders. Formal specifications have been proposed as ways to produce human-interpretable policies for autonomous systems that can still be learned from examples. Previous work showed that despite claims of interpretability, humans are unable to use formal specifications presented in a variety of ways to validate even simple robot behaviors. This work uses active learning, a standard pedagogical method, to attempt to improve humans’ ability to validate policies in signal temporal logic (STL). Results show that overall validation accuracy is not high, at 65% \pm 15% (mean \pm standard deviation), and that the three conditions of no active learning, active learning, and active learning with feedback do not significantly differ from each other. Our results suggest that the utility of formal specifications for human interpretability is still unsupported but point to other avenues of development which may enable improvements in system validation.

Designs for Enabling Collaboration in Human-Machine Teaming via Interactive and Explainable Systems

Rohan Paleja , Michael Munje , Kimberlee Chang , Reed Jensen , Matthew Gombolay

NeurIPS 2024 Neural Information Processing Systems (NeurIPS), 2024

Abstract PDF Code

Collaborative robots and machine learning-based virtual agents are increasingly entering the human workspace with the aim of increasing productivity and enhancing safety. Despite this, we show in a ubiquitous experimental domain, Overcooked-AI, that state-of-the-art techniques for human-machine teaming (HMT), which rely on imitation or reinforcement learning, are brittle and result in a machine agent that aims to decouple the machine and human’s actions to act independently rather than in a synergistic fashion. To remedy this deficiency, we develop HMT approaches that enable iterative, mixed-initiative team development allowing end-users to interactively reprogram interpretable AI teammates. Our 50-subject study provides several findings that we summarize into guidelines. While all approaches underperform a simple collaborative heuristic (a critical, negative result for learning-based methods), we find that white-box approaches supported by interactive modification can lead to significant team development, outperforming white-box approaches alone, and that black-box approaches are easier to train and result in better HMT performance highlighting a tradeoff between explainability and interactivity versus ease-of-training. Together, these findings present three important future research directions: 1) Improving the ability to generate collaborative agents with white-box models, 2) Better learning methods to facilitate collaboration rather than individualized coordination, and 3) Mixed-initiative interfaces that enable users, who may vary in ability, to improve collaboration.

The Effect of Robot Skill Level and Communication in Rapid, Proximate Human-Robot Collaboration

Kin Man Lee * , Arjun Krishna * , Zulfiqar Zaidi , Rohan Paleja , Letian Chen , Erin Hedlund-Botti , Mariah Schrum , Matthew Gombolay

HRI 2023 ACM/IEEE International Conference on Human-Robot Interaction (HRI), 2023

Abstract PDF

As high-speed, agile robots become more commonplace, these robots will have the potential to better aid and collaborate with humans. However, due to the increased agility and functionality of these robots, close collaboration with humans can create safety concerns that alter team dynamics and degrade task performance. In this work, we aim to enable the deployment of safe and trustworthy agile robots that operate in proximity with humans. We do so by 1) Proposing a novel human-robot doubles table tennis scenario to serve as a testbed for studying agile, proximate human-robot collaboration and 2) Conducting a user-study to understand how attributes of the robot (e.g., robot competency or capacity to communicate) impact team dynamics, perceived safety, and perceived trust, and how these latent factors affect human-robot collaboration (HRC) performance. We find that robot competency significantly increases perceived trust ($p<.001$), extending skill-to-trust assessments in prior studies to agile, proximate HRC. Furthermore, interestingly, we find that when the robot vocalizes its intention to perform a task, it results in a significant decrease in team performance (p=.037) and perceived safety of the system (p=.009).

Athletic Mobile Manipulator System for Robotic Wheelchair Tennis

Zulfiqar Zaidi * , Daniel Martin * , Nathaniel Belles , Viacheslav Zakharov , Arjun Krishna , Kin Man Lee , Peter Wagstaff , Sumedh Naik , Matthew Sklar , Sugju Choi , Yoshiki Kakehi , Ruturaj Patil , Divya Mallemadugula , Florian Pesce , Peter Wilson , Wendell Hom , Matan Diamond , Bryan Zhao , Nina Moorman , Rohan Paleja , Letian Chen , Esmaeil Seraj , Matthew Gombolay

IEEE RA-L 2023 IEEE Robotics and Automation Letters, Volume 8, Issue 4, pages 2245-2252, 2023

Abstract PDF

Athletics are a quintessential and universal expression of humanity. From French monks who in the 12th century invented jeu de paume, the precursor to modern lawn tennis, back to the K’iche’ people who played the Maya Ballgame as a form of religious expression over three thousand years ago, humans have sought to train their minds and bodies to excel in sporting contests. Advances in robotics are opening up the possibility of robots in sports. Yet, key challenges remain, as most prior works in robotics for sports are limited to pristine sensing environments, do not require significant force generation, or are on miniaturized scales unsuited for joint human-robot play. In this paper, we propose the first open-source, autonomous robot for playing regulation wheelchair tennis. We demonstrate the performance of our full-stack system in executing ground strokes and evaluate each of the system’s hardware and software components. The goal of this paper is to (1) inspire more research in human-scale robot athletics and (2) establish the first baseline for a reproducible wheelchair tennis robot for regulation singles play. Our paper contributes to the science of systems design and poses a set of key challenges for the robotics community to address in striving towards robots that can match human capabilities in sports.

Learning Models of Adversarial Agent Behavior under Partial Observability

Sean Ye , Manisha Natarajan , Zixuan Wu , Rohan Paleja , Letian Chen , Matthew Gombolay

IROS 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023

Abstract PDF

The need for opponent modeling and tracking arises in several real-world scenarios, such as professional sports, video game design, and drug-trafficking interdiction. In this work, we present graPh neurAl Network aDvErsarial MOdeliNg wIth mUtual informMation for modeling the behavior of an adversarial opponent agent. PANDEMONIUM is a novel graph neural network (GNN) based approach that uses mutual information maximization as an auxiliary objective to predict the current and future states of an adversarial opponent with partial observability. To evaluate PANDEMONIUM, we design two large-scale, pursuit-evasion domains inspired by real-world scenarios, where a team of heterogeneous agents is tasked with tracking and interdicting a single adversarial agent, and the adversarial agent must evade detection while achieving its own objectives. With the mutual information formulation, PANDEMONIUM outperforms all baselines in both domains and achieves 31.68% higher log-likelihood on average for future adversarial state predictions across both domains.

Adversarial Search and Tracking with Multiagent Reinforcement Learning in Sparsely Observable Environments

Zixuan Wu , Sean Ye , Manisha Natarajan , Letian Chen , Rohan Paleja , Matthew Gombolay

MRS 2023 International Symposium on Multi-Robot and Multi-Agent Systems (MRS), 2023

Abstract PDF

We study a search and tracking (S&T) problem where a team of dynamic search agents must collaborate to track an adversarial, evasive agent. The heterogeneous search team may only have access to a limited number of past adversary trajectories within a large search space. This problem is challenging for both model-based searching and reinforcement learning (RL) methods since the adversary exhibits reactionary and deceptive evasive behaviors in a large space leading to sparse detections for the search agents. To address this challenge, we propose a novel Multi-Agent RL (MARL) framework that leverages the estimated adversary location from our learnable filtering model. We show that our MARL architecture can outperform all baselines and achieves a 46% increase in detection rate.

Mutual Understanding in Human-Machine Teaming

Rohan Paleja * , Matthew Gombolay

AAAI DC 2022 Association for the Advancement of Artificial Intelligence Conference (AAAI) Doctoral Consortium, 2022

Abstract PDF

Collaborative robots (i.e., “cobots”) and machine learning-based virtual agents are increasingly entering the human workspace with the aim of increasing productivity, enhancing safety, and improving the quality of our lives. These agents will dynamically interact with a wide variety of people in dynamic and novel contexts, increasing the prevalence of human-machine teams in healthcare, manufacturing, and search-and-rescue. In this research, we enhance the mutual understanding within a human-machine team by enabling cobots to understand heterogeneous teammates via person-specific embeddings, identifying contexts in which xAI methods can help improve team mental model alignment, and enabling cobots to effectively communicate information that supports high-performance human-machine teaming.

Learning Efficient Diverse Communication for Cooperative Heterogeneous Teaming

Esmaeil Seraj * , Zheyuan Wang * , Rohan Paleja * , Daniel Martin , Matthew Sklar , Anirudh Patel , Matthew Gombolay

AAMAS 2022 International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2022

Abstract PDF

High-performing teams learn intelligent and efficient communication strategies to judiciously share information and reduce the cost of communication overhead. Within multi-agent reinforcement learning, synthesizing effective policies requires reasoning about when to communicate, whom to communicate with, and how to process messages. We propose a novel multi-agent reinforcement learning algorithm, Multi-Agent Graph-attentIon Communication (MAGIC), with a graph-attention communication protocol in which we learn 1) a Scheduler to help with the problems of when to communicate and whom to address messages to, and 2) a Message Processor using Graph Attention Networks (GATs) with dynamic graphs to deal with communication signals. The Scheduler consists of a graph attention encoder and a differentiable attention mechanism, which outputs dynamic, differentiable graphs to the Message Processor, which enables the Scheduler and Message Processor to be trained end-to-end. We evaluate our approach on a variety of cooperative tasks, including Google Research Football. Our method outperforms baselines across all domains, achieving $\approx 10\%$ increase in reward in the most challenging domain. We also show MAGIC communicates $23.2\%$ more efficiently than the average baseline, is robust to stochasticity, and scales to larger state-action spaces. Finally, we demonstrate MAGIC on a physical, multi-robot testbed.

Scaling Multi-Agent Reinforcement Learning via State Upsampling

Luis Pimentel * , Rohan Paleja * , Zheyuan Wang , Esmaeil Seraj , James Pagan , Matthew Gombolay

RSS W. 2022 RSS 2022 Workshop on Scaling Robot Learning (RSS22-SRL)

Abstract PDF

We consider the problem of scaling Multi-Agent Reinforcement Learning (MARL) algorithms toward larger environments and team sizes. While it is possible to learn a MARL-synthesized policy on these larger problems from scratch, training is difficult as the joint state-action space is much larger. Policy learning will require a large amount of experience (and associated training time) to reach a target performance. In this paper, we propose a transfer learning method that accelerates the training performance in such high-dimensional tasks with increased complexity. Our method upsamples an agent’s state representation in a smaller, less challenging, source task in order to pre-train a target policy for a larger, more challenging, target task. By transferring the policy after pre-training and continuing MARL in the target domain, the information learned within the source task enables higher performance within the target task in significantly less time than training from scratch. As such, our method enables the scalability of coordination problems. Furthermore, as our method only changes the state representation of agents across tasks, it is agnostic to the policy’s architecture and can be deployed across different MARL algorithms. We provide results showing that a policy trained under our method is able to achieve up to a 7.88$\times$ performance improvement under the same amount of training time, compared to a policy trained from scratch. Moreover, our method enables learning in difficult target task settings where training from scratch fails.

Learning Interpretable, High-Performing Policies for Autonomous Driving

Rohan Paleja * , Yaru Niu * , Andrew Silva , Chace Ritchie , Sugju Choi , Matthew Gombolay

RSS 2022 Robotics: Science and Systems (RSS), 2022

Abstract PDF Code

Gradient-based approaches in reinforcement learning have achieved tremendous success in learning policies for autonomous vehicles. While the performance of these approaches warrants real-world adoption, these policies lack interpretability, limiting deployability in the safety-critical and legally-regulated domain of autonomous driving (AD). AD requires interpretable and verifiable control policies that maintain high performance. We propose Interpretable Continuous Control Trees (ICCTs), a tree-based model that can be optimized via modern, gradient-based, RL approaches to produce high-performing, interpretable policies. The key to our approach is a procedure for allowing direct optimization in a sparse decision-tree-like representation. We validate ICCTs against baselines across six domains, showing that ICCTs are capable of learning interpretable policy representations that parity or outperform baselines by up to 33% in AD scenarios while achieving a 300x-600x reduction in the number of policy parameters against deep learning baselines. Furthermore, we demonstrate the interpretability and utility of our ICCTs through a 14-car physical robot demonstration.

Utilizing Human Feedback for Primitive Optimization in Wheelchair Tennis

Arjun Krishna , Zulfiqar Zaidi , Letian Chen , Rohan Paleja , Esmaeil Seraj , Matthew Gombolay

CoRL W. 2022 CoRL 2022 Learning for Agile Robotics Workshop

Abstract PDF

Agile robotics presents a difficult challenge with robots moving at high speeds requiring precise and low-latency sensing and control. Creating agile motion that accomplishes the task at hand while being safe to execute is a key requirement for agile robots to gain human trust. This requires designing new approaches that are flexible and maintain knowledge over world constraints. In this paper, we consider the problem of building a flexible and adaptive controller for a challenging agile mobile manipulation task of hitting ground strokes on a wheelchair tennis robot. We propose and evaluate an extension to the work done on learning striking behaviors using a probabilistic movement primitive (ProMP) framework by (1) demonstrating the safe execution of learned primitives on an agile mobile manipulator setup, and (2) proposing an online primitive refinement procedure that utilizes evaluative feedback from humans on the executed trajectories.

Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations

Letian Chen * , Sravan Jayanthi * , Rohan Paleja , Daniel Martin , Viacheslav Zakharov , Matthew Gombolay

CoRL 2022 Conference of Robot Learning, 2022

Abstract PDF

Learning from Demonstration (LfD) approaches empower end-users to teach robots novel tasks via demonstrations of the desired behaviors, democratizing access to robotics. However, current LfD frameworks are not capable of fast adaptation to heterogeneous human demonstrations nor the large-scale deployment in ubiquitous robotics applications. In this paper, we propose a novel LfD framework, Fast Lifelong Adaptive Inverse Reinforcement learning (FLAIR). Our approach (1) leverages learned strategies to construct policy mixtures for fast adaptation to new demonstrations, allowing for quick end-user personalization, (2) distills common knowledge across demonstrations, achieving accurate task inference; and (3) expands its model only when needed in lifelong deployments, maintaining a concise set of prototypical strategies that can approximate all behaviors via policy mixtures. We empirically validate that FLAIR achieves adaptability (i.e., the robot adapts to heterogeneous, user-specific task preferences), efficiency (i.e., the robot achieves sample-efficient adaptation), and scalability (i.e., the model grows sublinearly with the number of demonstrations while maintaining high performance). FLAIR surpasses benchmarks across three control tasks with an average 57% improvement in policy returns and an average 78% fewer episodes required for demonstration modeling using policy mixtures. Finally, we demonstrate the success of FLAIR in a table tennis task and find users rate FLAIR as having higher task (p < .05) and personalization (p < .05) performance.

Effects of Social Factors and Team Dynamics on Adoption of Collaborative Robot Autonomy

Mariah Schrum * , Glen Neville * , Michael Johnson * , Nina Moorman , Rohan Paleja , Karen Feigh , Matthew Gombolay

HRI 2021 ACM/IEEE International Conference on Human Robot Interaction (HRI), 2021

Abstract PDF

As automation becomes more prevalent, the fear of job loss due to automation increases. Workers may not be amenable to working with a robotic co-worker due to a negative perception of the technology. The attitudes of workers towards automation are influenced by a variety of complex and multi-faceted factors such as intention to use, perceived usefulness and other external variables. In an analog manufacturing environment, we explore how these various factors influence an individual’s willingness to work with a robot over a human co-worker in a collaborative Lego building task. We specifically explore how this willingness is affected by: 1) the level of social rapport established between the individual and his or her human co-worker, 2) the anthropomorphic qualities of the robot, and 3) factors including trust, fluency and personality traits. Our results show that a participant’s willingness to work with automation decreased due to lower perceived team fluency (p=0.045), rapport established between a participant and their co-worker (p=0.003), the gender of the participant being male (p=0.041), and a higher inherent trust in people (p=0.018).

Multi-Agent Graph-Attention Communication and Teaming

Yaru Niu * , Rohan Paleja * , Matthew Gombolay

AAMAS 2021 International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2021
Best Workshop Paper Award Winner at ICCV MAIR2 Workshop

Abstract PDF Code

High-performing teams learn effective communication strategies to judiciously share information and reduce the cost of communication overhead. Within multi-agent reinforcement learning, synthesizing effective policies requires reasoning about when to communicate, whom to communicate with, and how to process messages. We propose a novel multi-agent reinforcement learning algorithm, Multi-Agent Graph-attentIon Communication (MAGIC), with a graph-attention communication protocol in which we learn 1) a Scheduler to help with the problems of when to communicate and whom to address messages to, and 2) a Message Processor using Graph Attention Networks (GATs) with dynamic graphs to deal with communication signals. The Scheduler consists of a graph attention encoder and a differentiable attention mechanism, which outputs dynamic, differentiable graphs to the Message Processor, which enables the Scheduler and Message Processor to be trained end-to-end. We evaluate our approach on a variety of cooperative tasks, including Google Research Football. Our method outperforms baselines across all domains, achieving $\approx 10\%$ increase in reward in the most challenging domain. We also show MAGIC communicates $23.2\%$ more efficiently than the average baseline, is robust to stochasticity, and scales to larger state-action spaces. Finally, we demonstrate MAGIC on a physical, multi-robot testbed.

Towards Sample-efficient Apprenticeship Learning from Suboptimal Demonstration

Letian Chen , Rohan Paleja , Matthew Gombolay

AI-HRI 2021 AAAI Artificial Intelligence for Human-Robot Interaction (AI-HRI) Fall Symposium, 2021

Abstract PDF

Learning from Demonstration (LfD) seeks to democratize robotics by enabling non-roboticist end-users to teach robots to perform novel tasks by providing demonstrations. However, as demonstrators are typically non-experts, modern LfD techniques are unable to produce policies much better than the suboptimal demonstration. A previously-proposed framework, SSRR, has shown success in learning from suboptimal demonstration but relies on noise-injected trajectories to infer an idealized reward function. A random approach such as noise-injection to generate trajectories has two key drawbacks: 1) Performance degradation could be random depending on whether the noise is applied to vital states and 2) Noise-injection generated trajectories may have limited suboptimality and therefore will not accurately represent the whole scope of suboptimality. We present Systematic Self-Supervised Reward Regression, S3RR, to investigate systematic alternatives for trajectory degradation.

The Utility of Explainable AI in Ad Hoc Human-Machine Teaming

Rohan Paleja , Muyleng Ghuy , Nadun Ranawaka Arachchige , Reed Jensen , Matthew Gombolay

NeurIPS 2021 Conference on Neural Information Processing Systems (NeurIPS), 2021

Abstract PDF Talk

Recent advances in machine learning have led to growing interest in Explainable AI (xAI) to enable humans to gain insight into the decision-making of machine learning models. Despite this recent interest, the utility of xAI techniques has not yet been characterized in human-machine teaming. Importantly, xAI offers the promise of enhancing team situational awareness (SA) and shared mental model development, which are the key characteristics of effective human-machine teams. Rapidly developing such mental models is especially critical in ad hoc human-machine teaming, where agents do not have a priori knowledge of others’ decision-making strategies. In this paper, we present two novel human-subject experiments quantifying the benefits of deploying xAI techniques within a human-machine teaming scenario. First, we show that xAI techniques can support SA ($p<0.05$). Second, we examine how different SA levels induced via a collaborative AI policy abstraction affect ad hoc human-machine teaming performance. Importantly, we find that the benefits of xAI are not universal, as there is a strong dependence on the composition of the human-machine team. Novices benefit from xAI providing increased SA ($p<0.05$) but are susceptible to cognitive overhead ($p<0.05$). On the other hand, expert performance degrades with the addition of xAI-based support ($p<0.05$), indicating that the cost of paying attention to the xAI outweighs the benefits obtained from being provided additional information to enhance SA. Our results demonstrate that researchers must deliberately design and deploy the right xAI techniques in the right scenario by carefully considering human-machine team composition and how the xAI method augments SA.

Using Machine Learning to Predict Perfusionists Critical Decision-Making during Cardiac Surgery

Roger Dias , Marco Zenati , Geoff Rance , Rithy Srey , David Arney , Letian Chen , Rohan Paleja , Lauren Kennedy-Metz , Matthew Gombolay

CMBBE 2021 Computer Methods in Biomechanics and Biomedical Engineering, 2021

Abstract PDF

The cardiac surgery operating room is a high-risk and complex environment in which multiple experts work as a team to provide safe and excellent care to patients. During the cardiopulmonary bypass phase of cardiac surgery, critical decisions need to be made and the perfusionists play a crucial role in assessing available information and taking a certain course of action. In this paper, we report the findings of a simulation-based study using machine learning to build predictive models of perfusionists’ decision-making during critical situations in the operating room (OR). Performing 30-fold cross-validation across 30 random seeds, our machine learning approach was able to achieve an accuracy of 78.2% (95% confidence interval: 77.8% to 78.6%) in predicting perfusionists’ actions, having access to only 148 simulations. The findings from this study may inform future development of computerised clinical decision support tools to be embedded into the OR, improving patient safety and surgical outcomes.

Joint Goal and Strategy Inference across Heterogeneous Demonstrators via Reward Network Distillation

Letian Chen , Rohan Paleja , Muyleng Ghuy , Matthew Gombolay

HRI 2020 ACM/IEEE International Conference on Human Robot Interaction (HRI), 2020.

Abstract PDF

Reinforcement learning (RL) has achieved tremendous success as a general framework for learning how to make decisions. However, this success relies on the interactive hand-tuning of a reward function by RL experts. On the other hand, inverse reinforcement learning (IRL) seeks to learn a reward function from readily-obtained human demonstrations. Yet, IRL suffers from two major limitations: 1)reward ambiguity – there are an infinite number of possible re-ward functions that could explain an expert’s demonstration and 2) heterogeneity-human experts adopt varying strategies and preferences, which makes learning from multiple demonstrators difficult due to the common assumption that demonstrators seeks to maximize the same reward. In this work, we propose a method to jointly infer a task goal and humans’ strategic preferences via network distillation. This approach enables us to distill a robust task reward (addressing reward ambiguity) and to model each strategy’s objective (handling heterogeneity). We demonstrate our algorithm can better recover task reward and strategy rewards and imitate the strategies two simulated tasks and a real-world table tennis task.

Teaser for Interpretable and Personalized Apprenticeship Scheduling: Learning Interpretable Scheduling Policies from Heterogeneous User Demonstrations

Interpretable and Personalized Apprenticeship Scheduling: Learning Interpretable Scheduling Policies from Heterogeneous User Demonstrations

Rohan Paleja , Andrew Silva , Letian Chen , Matthew Gombolay

NeurIPS 2020 Conference on Neural Information Processing Systems (NeurIPS), 2020.

Abstract PDF

Resource scheduling and coordination is an NP-hard optimization requiring an efficient allocation of agents to a set of tasks with upper- and lower bound temporal and resource constraints. Due to the large-scale and dynamic nature of resource coordination in hospitals and factories, human domain experts manually plan and adjust schedules on the fly. To perform this job, domain experts leverage heterogeneous strategies and rules-of-thumb honed over years of apprenticeship. What is critically needed is the ability to extract this domain knowledge in a heterogeneous and interpretable apprenticeship learning framework to scale beyond the power of a single human expert, a necessity in safety-critical domains. We propose a personalized and interpretable apprenticeship scheduling algorithm that infers an interpretable representation of all human task demonstrators by extracting decision-making criteria specified by an inferred, personalized embedding without constraining the number of decision-making strategies. We achieve near-perfect LfD accuracy in synthetic domains and 88.22% accuracy on a real-world planning domain, outperforming baselines. Further, a user study conducted shows that our methodology produces both interpretable and highly usable models (p < 0.05).

Learning from Suboptimal Demonstration via Self-Supervised Reward Regression

Letian Chen , Rohan Paleja , Matthew Gombolay

CoRL 2020 Conference on Robot Learning (CoRL), 2020
Best Paper Award Finalist

Abstract PDF Code Talk Spotlight Talk

Learning from Demonstration (LfD) seeks to democratize robotics by enabling non-roboticist end-users to teach robots to perform a task by providing a human demonstration. However, modern LfD techniques, e.g. inverse reinforcement learning (IRL), assume users provide at least stochastically optimal demonstrations. This assumption fails to hold in most real-world scenarios. Recent attempts to learn from sub-optimal demonstration leverage pairwise rankings and following the Luce-Shepard rule. However, we show these approaches make incorrect assumptions and thus suffer from brittle, degraded performance. We overcome these limitations in developing a novel approach that bootstraps off suboptimal demonstrations to synthesize optimality-parameterized data to train an idealized reward function. We empirically validate we learn an idealized reward function with ~0.95 correlation with ground-truth reward versus ~0.75 for prior work. We can then train policies achieving ~200% improvement over the suboptimal demonstration and ~90% improvement over prior work. We present a physical demonstration of teaching a robot a topspin strike in table tennis that achieves 32% faster returns and 40% more topspin than user demonstration.

Heterogeneous Learning from Demonstration

Rohan Paleja , Matthew Gombolay

HRI W. 2019 International Conference on Human Robot Interaction (HRI) Pioneers Workshop, 2019

Abstract PDF

The development of human-robot systems able to leverage the strengths of both humans and their robotic counterparts has been greatly sought after because of the foreseen, broad-ranging impact across industry and research. We believe the true potential of these systems cannot be reached unless the robot is able to act with a high level of autonomy, reducing the burden of manual tasking or teleoperation. To achieve this level of autonomy, robots must be able to work fluidly with its human partners, inferring their needs without explicit commands. This inference requires the robot to be able to detect and classify the heterogeneity of its partners. We propose a framework for learning from heterogeneous demonstration based upon Bayesian inference and evaluate a suite of approaches on a real-world dataset of gameplay from StarCraft II. This evaluation provides evidence that our Bayesian approach can outperform conventional methods by up to 12.8%.

Talks

[02/2025] Invited Talk at the Carnegie Mellon University Human-Robot Interaction Reading Group
[02/2024] Invited Talk at the Carnegie Mellon University Safe AI Lab
[01/2024] Invited talk at the Transformative Science and Technology Lecture Series at Temple University
[06/2023] Invited talk at the Robot Learning Seminar hosted by Mila and the Robotics and Embodied AI Lab
[02/2023] Invited Talk, Brown University. Providence, RI 02912, USA

Misc

Honors and Awards

[2023] Sponsorship Award for our ICRA 2023 Explainable Robotics Workshop
[2021] Interactive Computing Graduate Teaching Assistant of the Year
[2021] Best Workshop Paper Award, International Conference on Computer Vision (ICCV) Workshop on Multi-Agent Interaction and Relational Reasoning
[2020] Best Paper Finalist Award, Conference of Robot Learning (CoRL)

Professional Service

Workshop Organizer, CoRL 2023 Workshop on Robot Learning in Athletics, Atlanta, Georgia, USA
Public Relations Vice President, Executive Board of the Robotics Graduate Student Organization, Georgia Institute of Technology
Workshop Organizer, ICRA 2023 Workshop on Explainable Robotics, London, United King- dom
Sponsorship Chair, Human-Robot Interaction (HRI) 2020 Pioneers Workshop, Cambridge, United Kingdom

Press coverage

This tennis-playing robot could one day win Wimbledon | USA Today | IEEE Spectrum | Tyler Morning Paragraph | News on the Neck | The News Times | Talker News | Daily Mail | Metro UK | Independent Record | Kenosha News | Tennis
Tennis Robot Could Pave Way for Advancement in Fast-Movement Robotics Video | Blog | Mashable | IOT World Today | Interesting Engineering | Watson
Georgia Tech Researchers Teach a Robot How to Improve at Ping Pong on Its Own Video | Blog
Georgia Tech Researchers Use Table Tennis to Understand Human-Robot Dynamics Blog

Get In Touch

If you'd like to discuss research, collaborations, or opportunities, feel free to reach out!

Email: rohan.paleja@mit.edu (or rpaleja [at] purdue.edu for Purdue inquiries)

You can also find me on:

Google Scholar GitHub LinkedIn Twitter

Strategies for Collaboration, Autonomy, Learning, and Exploration in Robotics Lab (SCALE Robotics Lab)

Openings

Recent News

Publications

Generalized Behavior Learning from Diverse Demonstrations

Heterogeneous Policy Networks for Composite Robot Team Communication and Coordination

STL: Still Tricky Logic (for System Validation, Even When Showing Your Work)

Designs for Enabling Collaboration in Human-Machine Teaming via Interactive and Explainable Systems

The Effect of Robot Skill Level and Communication in Rapid, Proximate Human-Robot Collaboration

Athletic Mobile Manipulator System for Robotic Wheelchair Tennis

Learning Models of Adversarial Agent Behavior under Partial Observability

Adversarial Search and Tracking with Multiagent Reinforcement Learning in Sparsely Observable Environments

Mutual Understanding in Human-Machine Teaming

Learning Efficient Diverse Communication for Cooperative Heterogeneous Teaming

Scaling Multi-Agent Reinforcement Learning via State Upsampling

Learning Interpretable, High-Performing Policies for Autonomous Driving

Utilizing Human Feedback for Primitive Optimization in Wheelchair Tennis

Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations

Effects of Social Factors and Team Dynamics on Adoption of Collaborative Robot Autonomy

Multi-Agent Graph-Attention Communication and Teaming

Towards Sample-efficient Apprenticeship Learning from Suboptimal Demonstration

The Utility of Explainable AI in Ad Hoc Human-Machine Teaming

Using Machine Learning to Predict Perfusionists Critical Decision-Making during Cardiac Surgery

Joint Goal and Strategy Inference across Heterogeneous Demonstrators via Reward Network Distillation

Interpretable and Personalized Apprenticeship Scheduling: Learning Interpretable Scheduling Policies from Heterogeneous User Demonstrations

Learning from Suboptimal Demonstration via Self-Supervised Reward Regression

Heterogeneous Learning from Demonstration

Talks

Misc

Honors and Awards

Professional Service

Press coverage

Get In Touch