Strategies for Collaboration, Autonomy, Learning, and Exploration in Robotics Lab
(SCALE Robotics Lab)
Our lab focuses on advancing machine learning and artificial intelligence to improve robot
learning,
human-robot interaction, and multi-agent collaboration.
- Interactive Robot Learning and Algorithmic Human-Robot Interaction: Developing new
computational approaches to help humans
interact with robots and teach new behaviors or correct existing behaviors.
- Explainable Robotics: Imbibing intelligent robotic systems with decision-making
capabilities and
behaviors that can be understood, traced, and trusted by humans, especially in critical settings like
healthcare, defense, and disaster response.
- Multi-Agent Coordination and Information Sharing: Developing methods that enable teams
of robots and humans to communicate, collaborate, and make decisions together in complex, dynamic
environments.
Openings
I'm looking for technically strong and self-motivated students (at all levels) to join my lab. For
PhD students with an application deadline in December 2025, please apply to Purdue CS, and mention my name in
your application (you are welcome to shoot me an email early at rpaleja {at} purdue.edu). For other
roles, please reach out directly and email your CV.
Admitted Purdue PhD Students If you have already been admitted and are looking for an advisor, please
email me directly. I am able to directly advise students in the CS department and can co-advise students in other departments.
Masters/Undergraduate students: If you are interested in working with me, please send me an email with your
CV and a brief description of your interests, and include [Prospective Masters/Undergraduate Student] in the subject respectively.
I am looking for students who are passionate about robotics
and AI, and who are eager to learn and contribute to our research projects.
Publications
*
denotes equal contribution. Blue - Conference. Orange - Journal. Pink - Workshop/Other.
Generalized Behavior Learning from Diverse Demonstrations
Varshith Sreeramdass
,
Rohan Paleja
,
Letian Chen
,
Sanne van Waveren
,
Matthew Gombolay
ICLR 2025
International Conference on Learning Representations (ICLR), 2025.
Diverse behavior policies are valuable in domains requiring quick test-time
adaptation or personalized human-robot interaction. Human demonstrations
provide rich information regarding task objectives and factors that govern
individual behavior variations, which can be used to characterize
\it{useful} diversity and learn diverse performant policies. However, we
show that prior work that builds naive representations of demonstration
heterogeneity fails in generating successful novel behaviors that generalize
over behavior factors. We propose Guided Strategy Discovery (GSD), which
introduces a novel diversity formulation based on a learned task-relevance
measure that prioritizes behaviors exploring modeled latent factors. We
empirically validate across three continuous control benchmarks for
generalizing to in-distribution (interpolation) and out-of-distribution
(extrapolation) factors that GSD outperforms baselines in novel behavior
discovery by 21%. Finally, we demonstrate that GSD can generalize striking
behaviors for table tennis in a virtual testbed while leveraging human
demonstrations collected in the real world.
Heterogeneous Policy Networks for Composite Robot Team Communication and Coordination
Esmaeil Seraj
*
,
Rohan Paleja
*
,
Luis Pimentel
,
Kin Man Lee
,
Zheyuan Wang
,
Daniel Martin
,
Matthew Sklar
,
John Zhang
,
Zahi Kakish
,
Matthew Gombolay
T-Ro 2024
IEEE Transaction on Robotics, Volume 40, pages 3833 - 3849
High-performing human–human teams learn intelligent and efficient
communication and coordination strategies to maximize their joint utility.
These teams implicitly understand the different roles of heterogeneous team
members and adapt their communication protocols accordingly. Multiagent
reinforcement learning (MARL) has attempted to develop computational methods
for synthesizing such joint coordination–communication strategies, but
emulating heterogeneous communication patterns across agents with different
state, action, and observation spaces has remained a challenge. Without
properly modeling agent heterogeneity, as in prior MARL work that leverages
homogeneous graph networks, communication becomes less helpful and can even
deteriorate the team’s performance. In the past, we proposed heterogeneous
policy networks (HetNet) to learn efficient and diverse communication models
for coordinating cooperative heterogeneous teams. In this extended work, we
extend HetNet to support scaling heterogeneous robot teams. Building on
heterogeneous graph-attention networks, we show that HetNet not only
facilitates learning heterogeneous collaborative policies, but also enables
end-to-end training for learning highly efficient binarized messaging. Our
empirical evaluation shows that HetNet sets a new state-of-the-art in
learning coordination and communication strategies for heterogeneous
multiagent teams by achieving an 5.84% to 707.65% performance improvement
over the next-best baseline across multiple domains while simultaneously
achieving a 200× reduction in the required communication bandwidth.
STL: Still Tricky Logic (for System Validation, Even When Showing Your Work)
Isabelle Hurley
,
Rohan Paleja
,
Ashley Suh
,
Jaime D. Peña
,
Ho Chit Siu
NeurIPS 2024
Conference on Neural Information Processing Systems (NeurIPS), 2024.
As learned control policies become increasingly common in autonomous
systems, there is increasing need to ensure that they are interpretable and
can be checked by human stakeholders. Formal specifications have been
proposed as ways to produce human-interpretable policies for autonomous
systems that can still be learned from examples. Previous work showed that
despite claims of interpretability, humans are unable to use formal
specifications presented in a variety of ways to validate even simple robot
behaviors. This work uses active learning, a standard pedagogical method, to
attempt to improve humans’ ability to validate policies in signal temporal
logic (STL). Results show that overall validation accuracy is not high, at
65% \pm 15% (mean \pm standard deviation), and that the three conditions of
no active learning,
active learning, and active learning with feedback do not significantly
differ from each other. Our results suggest that the utility of formal
specifications for human interpretability is still unsupported but point to
other avenues of development which may enable improvements in system
validation.
Designs for Enabling Collaboration in Human-Machine Teaming via Interactive and Explainable Systems
Rohan Paleja
,
Michael Munje
,
Kimberlee Chang
,
Reed Jensen
,
Matthew Gombolay
NeurIPS 2024
Neural Information Processing Systems (NeurIPS), 2024
Collaborative robots and machine learning-based virtual agents are
increasingly entering the human workspace with the aim of increasing
productivity and enhancing safety. Despite this, we show in a ubiquitous
experimental domain, Overcooked-AI, that state-of-the-art techniques for
human-machine teaming (HMT), which rely on imitation or reinforcement
learning, are brittle and result in a machine agent that aims to decouple
the machine and human’s actions to act independently rather than in a
synergistic fashion. To remedy this deficiency, we develop HMT approaches
that enable iterative, mixed-initiative team development allowing end-users
to interactively reprogram interpretable AI teammates. Our 50-subject study
provides several findings that we summarize into guidelines. While all
approaches underperform a simple collaborative heuristic (a critical,
negative result for learning-based methods), we find that white-box
approaches supported by interactive modification can lead to significant
team development, outperforming white-box approaches alone, and that
black-box approaches are easier to train and result in better HMT
performance highlighting a tradeoff between explainability and interactivity
versus ease-of-training. Together, these findings present three important
future research directions: 1) Improving the ability to generate
collaborative agents with white-box models, 2) Better learning methods to
facilitate collaboration rather than individualized coordination, and 3)
Mixed-initiative interfaces that enable users, who may vary in ability, to
improve collaboration.
The Effect of Robot Skill Level and Communication in Rapid, Proximate Human-Robot Collaboration
Kin Man Lee
*
,
Arjun Krishna
*
,
Zulfiqar Zaidi
,
Rohan Paleja
,
Letian Chen
,
Erin Hedlund-Botti
,
Mariah Schrum
,
Matthew Gombolay
HRI 2023
ACM/IEEE International Conference on Human-Robot Interaction (HRI), 2023
As high-speed, agile robots become more commonplace, these robots will have
the potential to better aid and collaborate with humans. However, due to the
increased agility and functionality of these robots, close collaboration
with humans can create safety concerns that alter team dynamics and degrade
task performance. In this work, we aim to enable the deployment of safe and
trustworthy agile robots that operate in proximity with humans. We do so by
1) Proposing a novel human-robot doubles table tennis scenario to serve as a
testbed for studying agile, proximate human-robot collaboration and 2)
Conducting a user-study to understand how attributes of the robot (e.g.,
robot competency or capacity to communicate) impact team dynamics, perceived
safety, and perceived trust, and how these latent factors affect human-robot
collaboration (HRC) performance. We find that robot competency significantly
increases perceived trust ($p<.001$), extending skill-to-trust assessments
in prior studies to agile, proximate HRC. Furthermore, interestingly, we
find that when the robot vocalizes its intention to perform a task, it
results in a significant decrease in team performance (p=.037) and perceived
safety of the system (p=.009).
Athletic Mobile Manipulator System for Robotic Wheelchair Tennis
Zulfiqar Zaidi
*
,
Daniel Martin
*
,
Nathaniel Belles
,
Viacheslav Zakharov
,
Arjun Krishna
,
Kin Man Lee
,
Peter Wagstaff
,
Sumedh Naik
,
Matthew Sklar
,
Sugju Choi
,
Yoshiki Kakehi
,
Ruturaj Patil
,
Divya Mallemadugula
,
Florian Pesce
,
Peter Wilson
,
Wendell Hom
,
Matan Diamond
,
Bryan Zhao
,
Nina Moorman
,
Rohan Paleja
,
Letian Chen
,
Esmaeil Seraj
,
Matthew Gombolay
IEEE RA-L 2023
IEEE Robotics and Automation Letters, Volume 8, Issue 4, pages 2245-2252, 2023
Athletics are a quintessential and universal expression of humanity. From
French monks who in the 12th century invented jeu de paume, the precursor to
modern lawn tennis, back to the K’iche’ people who played the Maya Ballgame
as a form of religious expression over three thousand years ago, humans have
sought to train their minds and bodies to excel in sporting contests.
Advances in robotics are opening up the possibility of robots in sports.
Yet, key challenges remain, as most prior works in robotics for sports are
limited to pristine sensing environments, do not require significant force
generation, or are on miniaturized scales unsuited for joint human-robot
play. In this paper, we propose the first open-source, autonomous robot for
playing regulation wheelchair tennis. We demonstrate the performance of our
full-stack system in executing ground strokes and evaluate each of the
system’s hardware and software components. The goal of this paper is to (1)
inspire more research in human-scale robot athletics and (2) establish the
first baseline for a reproducible wheelchair tennis robot for regulation
singles play. Our paper contributes to the science of systems design and
poses a set of key challenges for the robotics community to address in
striving towards robots that can match human capabilities in sports.
Learning Models of Adversarial Agent Behavior under Partial Observability
Sean Ye
,
Manisha Natarajan
,
Zixuan Wu
,
Rohan Paleja
,
Letian Chen
,
Matthew Gombolay
IROS 2023
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023
The need for opponent modeling and tracking arises in several real-world
scenarios, such as professional sports, video game design, and
drug-trafficking interdiction. In this work, we present graPh neurAl Network
aDvErsarial MOdeliNg wIth mUtual informMation for modeling the behavior of
an adversarial opponent agent. PANDEMONIUM is a novel graph neural network
(GNN) based approach that uses mutual information maximization as an
auxiliary objective to predict the current and future states of an
adversarial opponent with partial observability. To evaluate PANDEMONIUM, we
design two large-scale, pursuit-evasion domains inspired by real-world
scenarios, where a team of heterogeneous agents is tasked with tracking and
interdicting a single adversarial agent, and the adversarial agent must
evade detection while achieving its own objectives. With the mutual
information formulation, PANDEMONIUM outperforms all baselines in both
domains and achieves 31.68% higher log-likelihood on average for future
adversarial state predictions
across both domains.
Adversarial Search and Tracking with Multiagent Reinforcement Learning in Sparsely Observable Environments
Zixuan Wu
,
Sean Ye
,
Manisha Natarajan
,
Letian Chen
,
Rohan Paleja
,
Matthew Gombolay
MRS 2023
International Symposium on Multi-Robot and Multi-Agent Systems (MRS), 2023
We study a search and tracking (S&T) problem where a team of dynamic search
agents must collaborate to track an adversarial, evasive agent. The
heterogeneous search team may only have access to a limited number of past
adversary trajectories within a large search space. This problem is
challenging for both model-based searching and reinforcement learning (RL)
methods since the adversary exhibits reactionary and deceptive evasive
behaviors in a large space leading to sparse detections for the search
agents. To address this challenge, we propose a novel Multi-Agent RL (MARL)
framework that leverages the estimated adversary location from our learnable
filtering model. We show that our MARL architecture can outperform all
baselines and achieves a 46% increase in detection rate.
Mutual Understanding in Human-Machine Teaming
Rohan Paleja
*
,
Matthew Gombolay
AAAI DC 2022
Association for the Advancement of Artificial Intelligence Conference (AAAI) Doctoral Consortium, 2022
Collaborative robots (i.e., “cobots”) and machine learning-based virtual
agents are increasingly entering the human workspace with the aim of
increasing productivity, enhancing safety, and improving the quality of our
lives. These agents will dynamically interact with a wide variety of people
in dynamic and novel contexts, increasing the prevalence of human-machine
teams in healthcare, manufacturing, and search-and-rescue. In this research,
we enhance the mutual understanding within a human-machine team by enabling
cobots to understand heterogeneous teammates via person-specific embeddings,
identifying contexts in which xAI methods can help improve team mental model
alignment, and enabling cobots to effectively communicate information that
supports high-performance human-machine teaming.
Learning Efficient Diverse Communication for Cooperative Heterogeneous Teaming
Esmaeil Seraj
*
,
Zheyuan Wang
*
,
Rohan Paleja
*
,
Daniel Martin
,
Matthew Sklar
,
Anirudh Patel
,
Matthew Gombolay
AAMAS 2022
International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2022
High-performing teams learn intelligent and efficient communication strategies to
judiciously share information and reduce the cost of communication overhead.
Within multi-agent reinforcement learning, synthesizing effective policies
requires reasoning about when to communicate, whom to communicate with, and
how to process messages. We propose a novel multi-agent reinforcement
learning algorithm, Multi-Agent Graph-attentIon Communication (MAGIC), with
a graph-attention communication protocol in which we learn 1) a Scheduler to
help with the problems of when to communicate and whom to address messages
to, and 2) a Message Processor using Graph Attention Networks (GATs) with
dynamic graphs to deal with communication signals. The Scheduler consists of
a graph attention encoder and a differentiable attention mechanism, which
outputs dynamic, differentiable graphs to the Message Processor, which
enables the Scheduler and Message Processor to be trained end-to-end. We
evaluate our approach on a variety of cooperative tasks, including Google
Research Football. Our method outperforms baselines across all domains,
achieving $\approx 10\%$ increase in reward in the most challenging domain.
We also show MAGIC communicates $23.2\%$ more efficiently than the average
baseline, is robust to stochasticity, and scales to larger state-action
spaces. Finally, we demonstrate MAGIC on a physical, multi-robot testbed.
Scaling Multi-Agent Reinforcement Learning via State Upsampling
Luis Pimentel
*
,
Rohan Paleja
*
,
Zheyuan Wang
,
Esmaeil Seraj
,
James Pagan
,
Matthew Gombolay
RSS W. 2022
RSS 2022 Workshop on Scaling Robot Learning (RSS22-SRL)
Learning Interpretable, High-Performing Policies for Autonomous Driving
Rohan Paleja
*
,
Yaru Niu
*
,
Andrew Silva
,
Chace Ritchie
,
Sugju Choi
,
Matthew Gombolay
RSS 2022
Robotics: Science and Systems (RSS), 2022
Utilizing Human Feedback for Primitive Optimization in Wheelchair Tennis
Arjun Krishna
,
Zulfiqar Zaidi
,
Letian Chen
,
Rohan Paleja
,
Esmaeil Seraj
,
Matthew Gombolay
CoRL W. 2022
CoRL 2022 Learning for Agile Robotics Workshop
Agile robotics presents a difficult challenge with robots moving at high
speeds requiring precise and low-latency sensing and control. Creating agile
motion that accomplishes the task at hand while being safe to execute is a
key requirement for agile robots to gain human trust. This requires
designing new approaches that are flexible and maintain knowledge over world
constraints. In this paper, we consider the problem of building a flexible
and adaptive controller for a challenging agile mobile manipulation task of
hitting ground strokes on a wheelchair tennis robot. We propose and evaluate
an extension to the work done on learning striking behaviors using a
probabilistic movement primitive (ProMP) framework by (1) demonstrating the
safe execution of learned primitives on an agile mobile manipulator setup,
and (2) proposing an online primitive refinement procedure that utilizes
evaluative feedback from humans on the executed trajectories.
Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations
Letian Chen
*
,
Sravan Jayanthi
*
,
Rohan Paleja
,
Daniel Martin
,
Viacheslav Zakharov
,
Matthew Gombolay
CoRL 2022
Conference of Robot Learning, 2022
Learning from Demonstration (LfD) approaches empower end-users to teach
robots novel tasks via demonstrations of the desired behaviors,
democratizing access to robotics. However, current LfD frameworks are not
capable of fast adaptation to heterogeneous human demonstrations nor the
large-scale deployment in ubiquitous robotics applications. In this paper,
we propose a novel LfD framework, Fast Lifelong Adaptive Inverse
Reinforcement learning (FLAIR). Our approach (1) leverages learned
strategies to construct policy mixtures for fast adaptation to new
demonstrations, allowing for quick end-user personalization, (2) distills
common knowledge across demonstrations, achieving accurate task inference;
and (3) expands its model only when needed in lifelong deployments,
maintaining a concise set of prototypical strategies that can approximate
all behaviors via policy mixtures. We empirically validate that FLAIR
achieves adaptability (i.e., the robot adapts to heterogeneous,
user-specific task preferences), efficiency (i.e., the robot achieves
sample-efficient adaptation), and scalability (i.e., the model grows
sublinearly with the number of demonstrations while maintaining high
performance). FLAIR surpasses benchmarks across three control tasks with an
average 57% improvement in policy returns and an average 78% fewer episodes
required for demonstration modeling using policy mixtures. Finally, we
demonstrate the success of FLAIR in a table tennis task and find users rate
FLAIR as having higher task (p < .05) and personalization (p < .05)
performance.
Effects of Social Factors and Team Dynamics on Adoption of Collaborative Robot Autonomy
Mariah Schrum
*
,
Glen Neville
*
,
Michael Johnson
*
,
Nina Moorman
,
Rohan Paleja
,
Karen Feigh
,
Matthew Gombolay
HRI 2021
ACM/IEEE International Conference on Human Robot Interaction (HRI), 2021
As automation becomes more prevalent, the fear of job loss due to automation
increases. Workers may not be amenable to working with a robotic co-worker
due to a negative perception of the technology. The attitudes of workers
towards automation are influenced by a variety of complex and multi-faceted
factors such as intention to use, perceived usefulness and other external
variables. In an analog manufacturing environment, we explore how these
various factors influence an individual’s willingness to work with a robot
over a human co-worker in a collaborative Lego building task. We
specifically explore how this willingness is affected by: 1) the level of
social rapport established between the individual and his or her human
co-worker, 2) the anthropomorphic qualities of the robot, and 3) factors
including trust, fluency and personality traits. Our results show that a
participant’s willingness to work with automation decreased due to lower
perceived team fluency (p=0.045), rapport established between a participant
and their co-worker (p=0.003), the gender of the participant being male
(p=0.041), and a higher inherent trust in people (p=0.018).
Multi-Agent Graph-Attention Communication and Teaming
Yaru Niu
*
,
Rohan Paleja
*
,
Matthew Gombolay
AAMAS 2021
International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2021
Best Workshop Paper Award Winner at ICCV MAIR2 Workshop
High-performing teams learn effective communication strategies to
judiciously share information and reduce the cost of communication overhead.
Within multi-agent reinforcement learning, synthesizing effective policies
requires reasoning about when to communicate, whom to communicate with, and
how to process messages. We propose a novel multi-agent reinforcement
learning algorithm, Multi-Agent Graph-attentIon Communication (MAGIC), with
a graph-attention communication protocol in which we learn 1) a Scheduler to
help with the problems of when to communicate and whom to address messages
to, and 2) a Message Processor using Graph Attention Networks (GATs) with
dynamic graphs to deal with communication signals. The Scheduler consists of
a graph attention encoder and a differentiable attention mechanism, which
outputs dynamic, differentiable graphs to the Message Processor, which
enables the Scheduler and Message Processor to be trained end-to-end. We
evaluate our approach on a variety of cooperative tasks, including Google
Research Football. Our method outperforms baselines across all domains,
achieving $\approx 10\%$ increase in reward in the most challenging domain.
We also show MAGIC communicates $23.2\%$ more efficiently than the average
baseline, is robust to stochasticity, and scales to larger state-action
spaces. Finally, we demonstrate MAGIC on a physical, multi-robot testbed.
Towards Sample-efficient Apprenticeship Learning from Suboptimal Demonstration
Letian Chen
,
Rohan Paleja
,
Matthew Gombolay
AI-HRI 2021
AAAI Artificial Intelligence for Human-Robot Interaction (AI-HRI) Fall Symposium, 2021
Learning from Demonstration (LfD) seeks to democratize robotics by enabling
non-roboticist end-users to teach robots to perform novel tasks by providing
demonstrations. However, as demonstrators are typically non-experts, modern
LfD techniques are unable to produce policies much better than the
suboptimal demonstration. A previously-proposed framework, SSRR, has shown
success in learning from suboptimal demonstration but relies on
noise-injected trajectories to infer an idealized reward function. A random
approach such as noise-injection to generate trajectories has two key
drawbacks: 1) Performance degradation could be random depending on whether
the noise is applied to vital states and 2) Noise-injection generated
trajectories may have limited suboptimality and therefore will not
accurately represent the whole scope of suboptimality. We present Systematic
Self-Supervised Reward Regression, S3RR, to investigate systematic
alternatives for trajectory degradation.
The Utility of Explainable AI in Ad Hoc Human-Machine Teaming
Rohan Paleja
,
Muyleng Ghuy
,
Nadun Ranawaka Arachchige
,
Reed Jensen
,
Matthew Gombolay
NeurIPS 2021
Conference on Neural Information Processing Systems (NeurIPS), 2021
Recent advances in machine learning have led to growing interest in
Explainable AI (xAI) to enable humans to gain insight into the
decision-making of machine learning models. Despite this recent interest,
the utility of xAI techniques has not yet been characterized in
human-machine teaming. Importantly, xAI offers the promise of enhancing team
situational awareness (SA) and shared mental model development, which are
the key characteristics of effective human-machine teams. Rapidly developing
such mental models is especially critical in ad hoc human-machine teaming,
where agents do not have a priori knowledge of others’ decision-making
strategies. In this paper, we present two novel human-subject experiments
quantifying the benefits of deploying xAI techniques within a human-machine
teaming scenario. First, we show that xAI techniques can support SA
($p<0.05$). Second, we examine how different SA levels induced via a
collaborative AI policy abstraction affect ad hoc human-machine teaming
performance. Importantly, we find that the benefits of xAI are not
universal, as there is a strong dependence on the composition of the
human-machine team. Novices benefit from xAI providing increased SA
($p<0.05$) but are susceptible to cognitive overhead ($p<0.05$). On the
other hand, expert performance degrades with the addition of xAI-based
support ($p<0.05$), indicating that the cost of paying attention to the xAI
outweighs the benefits obtained from being provided additional information
to enhance SA. Our results demonstrate that researchers must deliberately
design and deploy the right xAI techniques in the right scenario by
carefully considering human-machine team composition and how the xAI method
augments SA.
Using Machine Learning to Predict Perfusionists Critical Decision-Making during Cardiac Surgery
Roger Dias
,
Marco Zenati
,
Geoff Rance
,
Rithy Srey
,
David Arney
,
Letian Chen
,
Rohan Paleja
,
Lauren Kennedy-Metz
,
Matthew Gombolay
CMBBE 2021
Computer Methods in Biomechanics and Biomedical Engineering, 2021
The cardiac surgery operating room is a high-risk and complex environment in
which multiple experts work as a team to provide safe and excellent care to
patients. During the cardiopulmonary bypass phase of cardiac surgery,
critical decisions need to be made and the perfusionists play a crucial role
in assessing available information and taking a certain course of action. In
this paper, we report the findings of a simulation-based study using machine
learning to build predictive models of perfusionists’ decision-making during
critical situations in the operating room (OR). Performing 30-fold
cross-validation across 30 random seeds, our machine learning approach was
able to achieve an accuracy of 78.2% (95% confidence interval: 77.8% to
78.6%) in predicting perfusionists’ actions, having access to only 148
simulations. The findings from this study may inform future development of
computerised clinical decision support tools to be embedded into the OR,
improving patient safety and surgical outcomes.
Joint Goal and Strategy Inference across Heterogeneous Demonstrators via Reward Network Distillation
Letian Chen
,
Rohan Paleja
,
Muyleng Ghuy
,
Matthew Gombolay
HRI 2020
ACM/IEEE International Conference on Human Robot Interaction (HRI), 2020.
Reinforcement learning (RL) has achieved tremendous success as a general
framework for learning how to make decisions. However, this success relies
on the interactive hand-tuning of a reward function by RL experts. On the
other hand, inverse reinforcement learning (IRL) seeks to learn a reward
function from readily-obtained human demonstrations. Yet, IRL suffers from
two major limitations: 1)reward ambiguity – there are an infinite number of
possible re-ward functions that could explain an expert’s demonstration and
2) heterogeneity-human experts adopt varying strategies and preferences,
which makes learning from multiple demonstrators difficult due to the common
assumption that demonstrators seeks to maximize the same reward. In this
work, we propose a method to jointly infer a task goal and humans’ strategic
preferences via network distillation. This approach enables us to distill a
robust task reward (addressing reward ambiguity) and to model each
strategy’s objective (handling heterogeneity). We demonstrate our algorithm
can better recover task reward and strategy rewards and imitate the
strategies two simulated tasks and a real-world table tennis task.
Interpretable and Personalized Apprenticeship Scheduling: Learning Interpretable Scheduling Policies from Heterogeneous User Demonstrations
Rohan Paleja
,
Andrew Silva
,
Letian Chen
,
Matthew Gombolay
NeurIPS 2020
Conference on Neural Information Processing Systems (NeurIPS), 2020.
Resource scheduling and coordination is an NP-hard optimization requiring an
efficient allocation of agents to a set of tasks with upper- and lower bound
temporal and resource constraints. Due to the large-scale and dynamic nature
of resource coordination in hospitals and factories, human domain experts
manually plan and adjust schedules on the fly. To perform this job, domain
experts leverage heterogeneous strategies and rules-of-thumb honed over
years of apprenticeship. What is critically needed is the ability to extract
this domain knowledge in a heterogeneous and interpretable apprenticeship
learning framework to scale beyond the power of a single human expert, a
necessity in safety-critical domains. We propose a personalized and
interpretable apprenticeship scheduling algorithm that infers an
interpretable representation of all human task demonstrators by extracting
decision-making criteria specified by an inferred, personalized embedding
without constraining the number of decision-making strategies. We achieve
near-perfect LfD accuracy in synthetic domains and 88.22% accuracy on a
real-world planning domain, outperforming baselines. Further, a user study
conducted shows that our methodology produces both interpretable and highly
usable models (p < 0.05).
Learning from Suboptimal Demonstration via Self-Supervised Reward Regression
Letian Chen
,
Rohan Paleja
,
Matthew Gombolay
CoRL 2020
Conference on Robot Learning (CoRL), 2020
Best Paper Award Finalist
Learning from Demonstration (LfD) seeks to democratize robotics by enabling
non-roboticist end-users to teach robots to perform a task by providing a
human demonstration. However, modern LfD techniques, e.g. inverse
reinforcement learning (IRL), assume users provide at least stochastically
optimal demonstrations. This assumption fails to hold in most real-world
scenarios. Recent attempts to learn from sub-optimal demonstration leverage
pairwise rankings and following the Luce-Shepard rule. However, we show
these approaches make incorrect assumptions and thus suffer from brittle,
degraded performance. We overcome these limitations in developing a novel
approach that bootstraps off suboptimal demonstrations to synthesize
optimality-parameterized data to train an idealized reward function. We
empirically validate we learn an idealized reward function with ~0.95
correlation with ground-truth reward versus ~0.75 for prior work. We can
then train policies achieving ~200% improvement over the suboptimal
demonstration and ~90% improvement over prior work. We present a physical
demonstration of teaching a robot a topspin strike in table tennis that
achieves 32% faster returns and 40% more topspin than user demonstration.
Heterogeneous Learning from Demonstration
Rohan Paleja
,
Matthew Gombolay
HRI W. 2019
International Conference on Human Robot Interaction (HRI) Pioneers Workshop, 2019
The development of human-robot systems able to leverage the strengths of
both humans and their robotic counterparts has been greatly sought after
because of the foreseen, broad-ranging impact across industry and research.
We believe the true potential of these systems cannot be reached unless the
robot is able to act with a high level of autonomy, reducing the burden of
manual tasking or teleoperation. To achieve this level of autonomy, robots
must be able to work fluidly with its human partners, inferring their needs
without explicit commands. This inference requires the robot to be able to
detect and classify the heterogeneity of its partners. We propose a
framework for learning from heterogeneous demonstration based upon Bayesian
inference and evaluate a suite of approaches on a real-world dataset of
gameplay from StarCraft II. This evaluation provides evidence that our
Bayesian approach can outperform conventional methods by up to 12.8%.