The navigation problem for autonomous mobile robots
can be considered as a Partially Observable Markov Decision Process
(POMDP). Especially, the path planning problem can be modelled (under
certain assumptions) as a Markov Decision Process (MDP) and solved,
e.g. by value iteration, efficiently. Though POMDPs give as a unified
and mathematically elegant framework for studying robot navigation
problems as well as optimal solution techniques, there are some open
problems regarding modelling aspects:
-
Given a policy as a solution strategy for a (PO)MDP, how can this
policy be executed reliably and efficiently by a reactive execution
component.
-
What is a good (PO)MDP model of a given navigation
problem? Furthermore: How can the specific characteristics of the
execution component be taken into account in this model?
-
How can the (PO)MDP be made transparent for a high-level planning or
reasoning system?
As we do not expect that general theories of these modelling
aspects can be formulated, we believe that the problems should be
tackled by the application of machine learning techniques. Mobile
robots should be equipped with the ability to autonomously learn:
-
How to execute a MDP policy effieciently and reliably,
-
How to model a given set of possible navigation problems as a MDP
in an adequate way and
-
How to build a model of the MDP planning and
MDP policy execution that can be used by a (symbolic) high-level
planning component.
In the project we explore how reinforcement learning techniques and
transformational learning can be used to build such learning robots.