Autonomous Robot Learning

Autonomous robots, such as robot office couriers, need control routines that support flexible task execution and effective action planning. In our research on autonomous robot learning we develop XFRMLEARN, a system that learns symbolic structured navigation plans. Given a navigation task, XFRMLEARN learns to structure continuous navigation behavior and represents the learned structure as compact and transparent plans. The structured plans are obtained by starting with monolithical default plans that are optimized for average performance and adding subplans to improve the navigation performance for the given task. The resulting plans support action planning and opportunistic task execution.

XFRMLearn

The navigation problem for autonomous mobile robots can be considered as a Partially Observable Markov Decision Process (POMDP). Especially, the path planning problem can be modelled (under certain assumptions) as a Markov Decision Process (MDP) and solved, e.g. by value iteration, efficiently. Though POMDPs give as a unified and mathematically elegant framework for studying robot navigation problems as well as optimal solution techniques, there are some open problems regarding modelling aspects:

Given a policy as a solution strategy for a (PO)MDP, how can this policy be executed reliably and efficiently by a reactive execution component.
What is a good (PO)MDP model of a given navigation problem? Furthermore: How can the specific characteristics of the execution component be taken into account in this model?
How can the (PO)MDP be made transparent for a high-level planning or reasoning system?

As we do not expect that general theories of these modelling aspects can be formulated, we believe that the problems should be tackled by the application of machine learning techniques. Mobile robots should be equipped with the ability to autonomously learn:

How to execute a MDP policy effieciently and reliably,
How to model a given set of possible navigation problems as a MDP in an adequate way and
How to build a model of the MDP planning and MDP policy execution that can be used by a (symbolic) high-level planning component.

In the project we explore how reinforcement learning techniques and transformational learning can be used to build such learning robots.