Publications

* denotes equal contribution and joint lead authorship.
Blue - Conference Papers.
Red - Workshop and Doctoral Consortia Papers.
Orange - Journal Papers.


2023

  1. Adversarial Search and Tracking with Multiagent Reinforcement Learning in Sparsely Observable Environments

    International Symposium on Multi-Robot and Multi-Agent Systems (MRS), 2023

    We study a search and tracking (S&T) problem where a team of dynamic search agents must collaborate to track an adversarial, evasive agent. The heterogeneous search team may only have access to a limited number of past adversary trajectories within a large search space. This problem is challenging for both model-based searching and reinforcement learning (RL) methods since the adversary exhibits reactionary and deceptive evasive behaviors in a large space leading to sparse detections for the search agents. To address this challenge, we propose a novel Multi-Agent RL (MARL) framework that leverages the estimated adversary location from our learnable filtering model. We show that our MARL architecture can outperform all baselines and achieves a 46% increase in detection rate.
  2. Learning Models of Adversarial Agent Behavior under Partial Observability

    IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023

    The need for opponent modeling and tracking arises in several real-world scenarios, such as professional sports, video game design, and drug-trafficking interdiction. In this work, we present graPh neurAl Network aDvErsarial MOdeliNg wIth mUtual informMation for modeling the behavior of an adversarial opponent agent. PANDEMONIUM is a novel graph neural network (GNN) based approach that uses mutual information maximization as an auxiliary objective to predict the current and future states of an adversarial opponent with partial observability. To evaluate PANDEMONIUM, we design two large-scale, pursuit-evasion domains inspired by real-world scenarios, where a team of heterogeneous agents is tasked with tracking and interdicting a single adversarial agent, and the adversarial agent must evade detection while achieving its own objectives. With the mutual information formulation, PANDEMONIUM outperforms all baselines in both domains and achieves 31.68% higher log-likelihood on average for future adversarial state predictions across both domains.
  3. Athletic Mobile Manipulator System for Robotic Wheelchair Tennis
    Zulfiqar Zaidi, Daniel Martin*, Nathaniel Belles, Viacheslav Zakharov, Arjun Krishna, Kin Man Lee, Peter Wagstaff, Sumedh Naik, Matthew Sklar, Sugju Choi, Yoshiki Kakehi, Ruturaj Patil, Divya Mallemadugula, Florian Pesce, Peter Wilson, Wendell Hom, Matan Diamond, Bryan Zhao, Nina Moorman, Rohan Paleja, Letian Chen, Esmaeil Seraj, and Matthew Gombolay.

    IEEE Robotics and Automation Letters, Volume 8, Issue 4, pages 2245-2252

    Athletics are a quintessential and universal expression of humanity. From French monks who in the 12th century invented jeu de paume, the precursor to modern lawn tennis, back to the K'iche' people who played the Maya Ballgame as a form of religious expression over three thousand years ago, humans have sought to train their minds and bodies to excel in sporting contests. Advances in robotics are opening up the possibility of robots in sports. Yet, key challenges remain, as most prior works in robotics for sports are limited to pristine sensing environments, do not require significant force generation, or are on miniaturized scales unsuited for joint human-robot play. In this paper, we propose the first open-source, autonomous robot for playing regulation wheelchair tennis. We demonstrate the performance of our full-stack system in executing ground strokes and evaluate each of the system's hardware and software components. The goal of this paper is to (1) inspire more research in human-scale robot athletics and (2) establish the first baseline for a reproducible wheelchair tennis robot for regulation singles play. Our paper contributes to the science of systems design and poses a set of key challenges for the robotics community to address in striving towards robots that can match human capabilities in sports.
  4. The Effect of Robot Skill Level and Communication in Rapid, Proximate Human-Robot Collaboration

    ACM/IEEE International Conference on Human-Robot Interaction (HRI), 2023

    As high-speed, agile robots become more commonplace, these robots will have the potential to better aid and collaborate with humans. However, due to the increased agility and functionality of these robots, close collaboration with humans can create safety concerns that alter team dynamics and degrade task performance. In this work, we aim to enable the deployment of safe and trustworthy agile robots that operate in proximity with humans. We do so by 1) Proposing a novel human-robot doubles table tennis scenario to serve as a testbed for studying agile, proximate human-robot collaboration and 2) Conducting a user-study to understand how attributes of the robot (e.g., robot competency or capacity to communicate) impact team dynamics, perceived safety, and perceived trust, and how these latent factors affect human-robot collaboration (HRC) performance. We find that robot competency significantly increases perceived trust ($p<.001$), extending skill-to-trust assessments in prior studies to agile, proximate HRC. Furthermore, interestingly, we find that when the robot vocalizes its intention to perform a task, it results in a significant decrease in team performance (p=.037) and perceived safety of the system (p=.009).

2022

  1. Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations

    Conference of Robot Learning, 2022

    Learning from Demonstration (LfD) approaches empower end-users to teach robots novel tasks via demonstrations of the desired behaviors, democratizing access to robotics. However, current LfD frameworks are not capable of fast adaptation to heterogeneous human demonstrations nor the large-scale deployment in ubiquitous robotics applications. In this paper, we propose a novel LfD framework, Fast Lifelong Adaptive Inverse Reinforcement learning (FLAIR). Our approach (1) leverages learned strategies to construct policy mixtures for fast adaptation to new demonstrations, allowing for quick end-user personalization, (2) distills common knowledge across demonstrations, achieving accurate task inference; and (3) expands its model only when needed in lifelong deployments, maintaining a concise set of prototypical strategies that can approximate all behaviors via policy mixtures. We empirically validate that FLAIR achieves adaptability (i.e., the robot adapts to heterogeneous, user-specific task preferences), efficiency (i.e., the robot achieves sample-efficient adaptation), and scalability (i.e., the model grows sublinearly with the number of demonstrations while maintaining high performance). FLAIR surpasses benchmarks across three control tasks with an average 57% improvement in policy returns and an average 78% fewer episodes required for demonstration modeling using policy mixtures. Finally, we demonstrate the success of FLAIR in a table tennis task and find users rate FLAIR as having higher task (p < .05) and personalization (p < .05) performance.
  2. Utilizing Human Feedback for Primitive Optimization in Wheelchair Tennis

    CoRL 2022 Learning for Agile Robotics Workshop

    Agile robotics presents a difficult challenge with robots moving at high speeds requiring precise and low-latency sensing and control. Creating agile motion that accomplishes the task at hand while being safe to execute is a key requirement for agile robots to gain human trust. This requires designing new approaches that are flexible and maintain knowledge over world constraints. In this paper, we consider the problem of building a flexible and adaptive controller for a challenging agile mobile manipulation task of hitting ground strokes on a wheelchair tennis robot. We propose and evaluate an extension to the work done on learning striking behaviors using a probabilistic movement primitive (ProMP) framework by (1) demonstrating the safe execution of learned primitives on an agile mobile manipulator setup, and (2) proposing an online primitive refinement procedure that utilizes evaluative feedback from humans on the executed trajectories.
  3. Learning Interpretable, High-Performing Policies for Autonomous Driving

    Robotics: Science and Systems, 2022

    Gradient-based approaches in reinforcement learning have achieved tremendous success in learning policies for autonomous vehicles. While the performance of these approaches warrants real-world adoption, these policies lack interpretability, limiting deployability in the safety-critical and legally-regulated domain of autonomous driving (AD). AD requires interpretable and verifiable control policies that maintain high performance. We propose Interpretable Continuous Control Trees (ICCTs), a tree-based model that can be optimized via modern, gradient-based, RL approaches to produce high-performing, interpretable policies. The key to our approach is a procedure for allowing direct optimization in a sparse decision-tree-like representation. We validate ICCTs against baselines across six domains, showing that ICCTs are capable of learning interpretable policy representations that parity or outperform baselines by up to 33% in AD scenarios while achieving a 300x-600x reduction in the number of policy parameters against deep learning baselines. Furthermore, we demonstrate the interpretability and utility of our ICCTs through a 14-car physical robot demonstration.
  4. Scaling Multi-Agent Reinforcement Learning via State Upsampling

    RSS 2022 Workshop on Scaling Robot Learning (RSS22-SRL)

    We consider the problem of scaling Multi-Agent Reinforcement Learning (MARL) algorithms toward larger environments and team sizes. While it is possible to learn a MARL-synthesized policy on these larger problems from scratch, training is difficult as the joint state-action space is much larger. Policy learning will require a large amount of experience (and associated training time) to reach a target performance. In this paper, we propose a transfer learning method that accelerates the training performance in such high-dimensional tasks with increased complexity. Our method upsamples an agent’s state representation in a smaller, less challenging, source task in order to pre-train a target policy for a larger, more challenging, target task. By transferring the policy after pre-training and continuing MARL in the target domain, the information learned within the source task enables higher performance within the target task in significantly less time than training from scratch. As such, our method enables the scalability of coordination problems. Furthermore, as our method only changes the state representation of agents across tasks, it is agnostic to the policy’s architecture and can be deployed across different MARL algorithms. We provide results showing that a policy trained under our method is able to achieve up to a 7.88$\times$ performance improvement under the same amount of training time, compared to a policy trained from scratch. Moreover, our method enables learning in difficult target task settings where training from scratch fails.
  5. Learning Efficient Diverse Communication for Cooperative Heterogeneous Teaming

    International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2022

    High-performing teams learn intelligent and efficient communication and coordination strategies to maximize their joint utility. These teams implicitly understand the different roles of heterogeneous team members and adapt their communication protocols accordingly. Multi-Agent Reinforcement Learning (MARL) seeks to develop computational methods for synthesizing such coordination strategies, but formulating models for heterogeneous teams with different state, action, and observation spaces has remained an open problem. Without properly modeling agent heterogeneity, as in prior MARL work that leverages homogeneous graph networks, communication becomes less helpful and can even deteriorate the cooperativity and team performance. We propose Heterogeneous Policy Networks (HetNet) to learn efficient and diverse communication models for coordinating cooperative heterogeneous teams. Building on heterogeneous graph-attention networks, we show that HetNet not only facilitates learning heterogeneous collaborative policies per existing agent-class but also enables end-to-end training for learning highly efficient binarized messaging.
  6. Mutual Understanding in Human-Machine Teaming
    Rohan Paleja*, and Matthew Gombolay.

    Association for the Advancement of Artificial Intelligence Conference (AAAI) Doctoral Consortium, 2022

    Collaborative robots (i.e., “cobots”) and machine learning-based virtual agents are increasingly entering the human workspace with the aim of increasing productivity, enhancing safety, and improving the quality of our lives. These agents will dynamically interact with a wide variety of people in dynamic and novel contexts, increasing the prevalence of human-machine teams in healthcare, manufacturing, and search-and-rescue. In this research, we enhance the mutual understanding within a human-machine team by enabling cobots to understand heterogeneous teammates via person-specific embeddings, identifying contexts in which xAI methods can help improve team mental model alignment, and enabling cobots to effectively communicate information that supports high-performance human-machine teaming.

2021

  1. Using Machine Learning to Predict Perfusionists Critical Decision-Making during Cardiac Surgery
    Roger Dias , Marco Zenati, Geoff Rance, Rithy Srey, David Arney, Letian Chen, Rohan Paleja, Lauren Kennedy-Metz, and Matthew Gombolay.

    Computer Methods in Biomechanics and Biomedical Engineering, 2021

    The cardiac surgery operating room is a high-risk and complex environment in which multiple experts work as a team to provide safe and excellent care to patients. During the cardiopulmonary bypass phase of cardiac surgery, critical decisions need to be made and the perfusionists play a crucial role in assessing available information and taking a certain course of action. In this paper, we report the findings of a simulation-based study using machine learning to build predictive models of perfusionists’ decision-making during critical situations in the operating room (OR). Performing 30-fold cross-validation across 30 random seeds, our machine learning approach was able to achieve an accuracy of 78.2% (95% confidence interval: 77.8% to 78.6%) in predicting perfusionists’ actions, having access to only 148 simulations. The findings from this study may inform future development of computerised clinical decision support tools to be embedded into the OR, improving patient safety and surgical outcomes.
  2. The Utility of Explainable AI in Ad Hoc Human-Machine Teaming

    Conference on Neural Information Processing Systems (NeurIPS), 2021

    Recent advances in machine learning have led to growing interest in Explainable AI (xAI) to enable humans to gain insight into the decision-making of machine learning models. Despite this recent interest, the utility of xAI techniques has not yet been characterized in human-machine teaming. Importantly, xAI offers the promise of enhancing team situational awareness (SA) and shared mental model development, which are the key characteristics of effective human-machine teams. Rapidly developing such mental models is especially critical in ad hoc human-machine teaming, where agents do not have a priori knowledge of others’ decision-making strategies. In this paper, we present two novel human-subject experiments quantifying the benefits of deploying xAI techniques within a human-machine teaming scenario. First, we show that xAI techniques can support SA ($p<0.05)$. Second, we examine how different SA levels induced via a collaborative AI policy abstraction affect ad hoc human-machine teaming performance. Importantly, we find that the benefits of xAI are not universal, as there is a strong dependence on the composition of the human-machine team. Novices benefit from xAI providing increased SA ($p<0.05$) but are susceptible to cognitive overhead ($p<0.05$). On the other hand, expert performance degrades with the addition of xAI-based support ($p<0.05$), indicating that the cost of paying attention to the xAI outweighs the benefits obtained from being provided additional information to enhance SA. Our results demonstrate that researchers must deliberately design and deploy the right xAI techniques in the right scenario by carefully considering human-machine team composition and how the xAI method augments SA.
  3. Towards Sample-efficient Apprenticeship Learning from Suboptimal Demonstration
    Letian Chen, Rohan Paleja, and Matthew Gombolay.

    AAAI Artificial Intelligence for Human-Robot Interaction (AI-HRI) Fall Symposium, 2021

    Learning from Demonstration (LfD) seeks to democratize robotics by enabling non-roboticist end-users to teach robots to perform novel tasks by providing demonstrations. However, as demonstrators are typically non-experts, modern LfD techniques are unable to produce policies much better than the suboptimal demonstration. A previously-proposed framework, SSRR, has shown success in learning from suboptimal demonstration but relies on noise-injected trajectories to infer an idealized reward function. A random approach such as noise-injection to generate trajectories has two key drawbacks: 1) Performance degradation could be random depending on whether the noise is applied to vital states and 2) Noise-injection generated trajectories may have limited suboptimality and therefore will not accurately represent the whole scope of suboptimality. We present Systematic Self-Supervised Reward Regression, S3RR, to investigate systematic alternatives for trajectory degradation.
  4. Multi-Agent Graph-Attention Communication and Teaming
    Yaru Niu * , Rohan Paleja*, and Matthew Gombolay.

    International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2021
    Best Workshop Paper Award Winner at ICCV MAIR2 Workshop

    High-performing teams learn effective communication strategies to judiciously share information and reduce the cost of communication overhead. Within multi-agent reinforcement learning, synthesizing effective policies requires reasoning about when to communicate, whom to communicate with, and how to process messages. We propose a novel multi-agent reinforcement learning algorithm, Multi-Agent Graph-attentIon Communication (MAGIC), with a graph-attention communication protocol in which we learn 1) a Scheduler to help with the problems of when to communicate and whom to address messages to, and 2) a Message Processor using Graph Attention Networks (GATs) with dynamic graphs to deal with communication signals. The Scheduler consists of a graph attention encoder and a differentiable attention mechanism, which outputs dynamic, differentiable graphs to the Message Processor, which enables the Scheduler and Message Processor to be trained end-to-end. We evaluate our approach on a variety of cooperative tasks, including Google Research Football. Our method outperforms baselines across all domains, achieving $\approx 10\%$ increase in reward in the most challenging domain. We also show MAGIC communicates $23.2\%$ more efficiently than the average baseline, is robust to stochasticity, and scales to larger state-action spaces. Finally, we demonstrate MAGIC on a physical, multi-robot testbed.
  5. Effects of Social Factors and Team Dynamics on Adoption of Collaborative Robot Autonomy
    Mariah Schrum*, Glen Neville*, Michael Johnson*, Nina Moorman, Rohan Paleja, Karen Feigh, and Matthew Gombolay.

    ACM/IEEE International Conference on Human Robot Interaction (HRI), 2021

    As automation becomes more prevalent, the fear of job loss due to automation increases. Workers may not be amenable to working with a robotic co-worker due to a negative perception of the technology. The attitudes of workers towards automation are influenced by a variety of complex and multi-faceted factors such as intention to use, perceived usefulness and other external variables. In an analog manufacturing environment, we explore how these various factors influence an individual’s willingness to work with a robot over a human co-worker in a collaborative Lego building task. We specifically explore how this willingness is affected by: 1) the level of social rapport established between the individual and his or her human co-worker, 2) the anthropomorphic qualities of the robot, and 3) factors including trust, fluency and personality traits. Our results show that a participant’s willingness to work with automation decreased due to lower perceived team fluency (p=0.045), rapport established between a participant and their co-worker (p=0.003), the gender of the participant being male (p=0.041), and a higher inherent trust in people (p=0.018).

2020

  1. Learning from Suboptimal Demonstration via Self-Supervised Reward Regression
    Letian Chen, Rohan Paleja, and Matthew Gombolay.

    Conference on Robot Learning (CoRL), 2021
    Best Paper Award Finalist

    Learning from Demonstration (LfD) seeks to democratize robotics by enabling non-roboticist end-users to teach robots to perform a task by providing a human demonstration. However, modern LfD techniques, e.g. inverse reinforcement learning (IRL), assume users provide at least stochastically optimal demonstrations. This assumption fails to hold in most real-world scenarios. Recent attempts to learn from sub-optimal demonstration leverage pairwise rankings and following the Luce-Shepard rule. However, we show these approaches make incorrect assumptions and thus suffer from brittle, degraded performance. We overcome these limitations in developing a novel approach that bootstraps off suboptimal demonstrations to synthesize optimality-parameterized data to train an idealized reward function. We empirically validate we learn an idealized reward function with ~0.95 correlation with ground-truth reward versus ~0.75 for prior work. We can then train policies achieving ~200% improvement over the suboptimal demonstration and ~90% improvement over prior work. We present a physical demonstration of teaching a robot a topspin strike in table tennis that achieves 32% faster returns and 40% more topspin than user demonstration.
  2. Interpretable and Personalized Apprenticeship Scheduling: Learning Interpretable Scheduling Policies from Heterogeneous User Demonstrations

    Conference on Neural Information Processing Systems (NeurIPS), 2020.

    Resource scheduling and coordination is an NP-hard optimization requiring an efficient allocation of agents to a set of tasks with upper- and lower bound temporal and resource constraints. Due to the large-scale and dynamic nature of resource coordination in hospitals and factories, human domain experts manually plan and adjust schedules on the fly. To perform this job, domain experts leverage heterogeneous strategies and rules-of-thumb honed over years of apprenticeship. What is critically needed is the ability to extract this domain knowledge in a heterogeneous and interpretable apprenticeship learning framework to scale beyond the power of a single human expert, a necessity in safety-critical domains. We propose a personalized and interpretable apprenticeship scheduling algorithm that infers an interpretable representation of all human task demonstrators by extracting decision-making criteria specified by an inferred, personalized embedding without constraining the number of decision-making strategies. We achieve near-perfect LfD accuracy in synthetic domains and 88.22% accuracy on a real-world planning domain, outperforming baselines. Further, a user study conducted shows that our methodology produces both interpretable and highly usable models (p < 0.05).
  3. Joint Goal and Strategy Inference across Heterogeneous Demonstrators via Reward Network Distillation

    ACM/IEEE International Conference on Human Robot Interaction (HRI), 2020.

    Reinforcement learning (RL) has achieved tremendous success as a general framework for learning how to make decisions. However, this success relies on the interactive hand-tuning of a reward function by RL experts. On the other hand, inverse reinforcement learning (IRL) seeks to learn a reward function from readily-obtained human demonstrations. Yet, IRL suffers from two major limitations: 1)reward ambiguity – there are an infinite number of possible re-ward functions that could explain an expert’s demonstration and 2) heterogeneity-human experts adopt varying strategies and preferences, which makes learning from multiple demonstrators difficult due to the common assumption that demonstrators seeks to maximize the same reward. In this work, we propose a method to jointly infer a task goal and humans’ strategic preferences via network distillation. This approach enables us to distill a robust task reward (addressing reward ambiguity) and to model each strategy’s objective (handling heterogeneity). We demonstrate our algorithm can better recover task reward and strategy rewards and imitate the strategies two simulated tasks and a real-world table tennis task.

2019

  1. Heterogeneous Learning from Demonstration
    Rohan Paleja, and Matthew Gombolay.

    International Conference on Human Robot Interaction (HRI) Pioneers Workshop

    The development of human-robot systems able to leverage the strengths of both humans and their robotic counterparts has been greatly sought after because of the foreseen, broad-ranging impact across industry and research. We believe the true potential of these systems cannot be reached unless the robot is able to act with a high level of autonomy, reducing the burden of manual tasking or teleoperation. To achieve this level of autonomy, robots must be able to work fluidly with its human partners, inferring their needs without explicit commands. This inference requires the robot to be able to detect and classify the heterogeneity of its partners. We propose a framework for learning from heterogeneous demonstration based upon Bayesian inference and evaluate a suite of approaches on a real-world dataset of gameplay from StarCraft II. This evaluation provides evidence that our Bayesian approach can outperform conventional methods by up to 12.8%.