Legged robots that can operate autonomously in remote and hazardous environments will greatly increase opportunities for exploration into underexplored areas. Exteroceptive perception is crucial for fast and energy-efficient locomotion: Perceiving the terrain before making contact with it enables planning and adaptation of the gait ahead of time to maintain speed and stability. However, using exteroceptive perception robustly for locomotion has remained a grand challenge in robotics. Snow, vegetation, and water visually appear as obstacles on which the robot cannot step or are missing altogether due to high reflectance. In addition, depth perception can degrade due to difficult lighting, dust, fog, reflective or transparent surfaces, sensor occlusion, and more. For this reason, the most robust and general solutions to legged locomotion to date rely solely on proprioception. This severely limits locomotion speed because the robot has to physically feel out the terrain before adapting its gait accordingly. Here, we present a robust and general solution to integrating exteroceptive and proprioceptive perception for legged locomotion. We leverage an attention-based recurrent encoder that integrates proprioceptive and exteroceptive input. The encoder is trained end to end and learns to seamlessly combine the different perception modalities without resorting to heuristics. The result is a legged locomotion controller with high robustness and speed. The controller was tested in a variety of challenging natural and urban environments over multiple seasons and completed an hour-long hike in the Alps in the time recommended for human hikers.
The key component is a recurrent encoder that combines proprioception and exteroception into an integrated belief state. The encoder is trained in simulation to capture ground-truth information about the terrain given exteroceptive observations that may be incomplete, biased, and noisy. The belief state encoder is trained end-to-end to integrate proprioceptive and exteroceptive data without resorting to heuristics. It learns to take advantage of the foresight afforded by exteroception to plan footholds and accelerate locomotion when exteroception is reliable, and can seamlessly fall back to robust proprioceptive locomotion when needed.
The controller is trained via privileged learning. A teacher policy is first trained via Reinforcement Learning (RL) with full access to privileged information in the form of the ground-truth state of the environment. This privileged training enables the teacher policy to discover the optimal behavior given perfect knowledge of the terrain. We then train a student policy that only has access to information that is available in the field on the physical robot. The student policy is built around our belief state encoder and trained via imitation learning. The student policy learns to predict the teacher's optimal action given only partial and noisy observations of the environment.
Two different types of measurement noise are applied when sampling the heights:
To evaluate the effect of the attention gate, we compare the performance of the belief encoder with and without the gate. Two student policies are trained using different belief encoders:
In each figure, the purple curve represents the RNN-Attn policy, while the yellow line denotes the RNN policy.
Performance: Both RNN-Attn and RNN show a rapid increase initially and stabilize around 1000 steps after 100 iterations.
Mean Episode Length
Mean Rewards
Terrain Levels
Losses: The attention gate significantly enhances the decoder's reconstruction capability by selectively regulating the passage of exteroceptive information. When the exteroceptive sensor is severely affected by environmental noise, the decoder becomes less dependent on this information, improving robustness.
Behavior Cloning Loss
Latent Loss
Exteroception Reconstruction Loss
To assess the effectiveness of the attention mechanism in reconstructing privileged exteroceptive information, we conduct a simulation experiment on a quadruped robot. Multiple videos are recorded for analysis, where dots underneath the robot represent either:
Privileged Extero-info
Raw Extero-info (noisy)
Two videos below shows the estimated exteroceptive information provided by RNN and RNN-Attn respectively.
Reconstructed Extero-info by RNN
Reconstructed Extero-info by RNN-Attn
Another set of experiments are also conducted with a lower noise level.
Raw Extero-info (with a lower noise level)
Reconstructed Extero-info