Bidep Locomotion in Sloped Environments - Combining Reinforcement Learning and Dynamical Systems

In this work, reinforcement learning techniques are implemented and compared to address biped locomotion optimization. Central Pattern Generators (CPGs) and Dynamic Movement Primitives (DMPs) were combined to easily produce complex trajectories for the joints of a simulated DARwIn-OP humanoid robot. 

Two reinforcement learning algorithms, Policy Learning by Weighting Exploration with the Returns (PoWER) and Path Integral Policy Improvement with Covariance Matrix Adaptation (PI2-CMA) were implemented in the simulated DARwIn-OP to seek optimal DMPs' weights that maximize frontal velocity when facing different situations. The tested environments demand adaptation from the controller in order to successfully walk in the different slopes. Elitism was introduced in PI2-CMA in order to improve the convergence property of the algorithm.


Results show that these approaches enabled easy adaptation of DARwIn-OP to new situations. The results are very promising and demonstrate the flexibility at generating or adapting new trajectories for locomotion.

People involved in this project: 
Project status: 
Past or closed project
flat_cpgs.mp43.62 MB
flat_power.mp41.92 MB
up_cpgs.mp41.77 MB
up_power.mp42.17 MB
down_cpgs.mp42.21 MB
down_power.mp42.55 MB
bump_cpgs.mp41.9 MB
bump_power.mp42.27 MB