PerceptiveLOCO

Legged robots that can operate autonomously in remote and hazardous environments will greatly increase opportunities for exploration into underexplored areas. Exteroceptive perception is crucial for fast and energy-efficient locomotion: Perceiving the terrain before making contact with it enables planning and adaptation of the gait ahead of time to maintain speed and stability. However, using exteroceptive perception robustly for locomotion has remained a grand challenge in robotics. Snow, vegetation, and water visually appear as obstacles on which the robot cannot step or are missing altogether due to high reflectance. In addition, depth perception can degrade due to difficult lighting, dust, fog, reflective or transparent surfaces, sensor occlusion, and more. For this reason, the most robust and general solutions to legged locomotion to date rely solely on proprioception. This severely limits locomotion speed because the robot has to physically feel out the terrain before adapting its gait accordingly. Here, we present a robust and general solution to integrating exteroceptive and proprioceptive perception for legged locomotion. We leverage an attention-based recurrent encoder that integrates proprioceptive and exteroceptive input. The encoder is trained end to end and learns to seamlessly combine the different perception modalities without resorting to heuristics. The result is a legged locomotion controller with high robustness and speed. The controller was tested in a variety of challenging natural and urban environments over multiple seasons and completed an hour-long hike in the Alps in the time recommended for human hikers.

The key component is a recurrent encoder that combines proprioception and exteroception into an integrated belief state. The encoder is trained in simulation to capture ground-truth information about the terrain given exteroceptive observations that may be incomplete, biased, and noisy. The belief state encoder is trained end-to-end to integrate proprioceptive and exteroceptive data without resorting to heuristics. It learns to take advantage of the foresight afforded by exteroception to plan footholds and accelerate locomotion when exteroception is reliable, and can seamlessly fall back to robust proprioceptive locomotion when needed.

The controller is trained via privileged learning. A teacher policy is first trained via Reinforcement Learning (RL) with full access to privileged information in the form of the ground-truth state of the environment. This privileged training enables the teacher policy to discover the optimal behavior given perfect knowledge of the terrain. We then train a student policy that only has access to information that is available in the field on the physical robot. The student policy is built around our belief state encoder and trained via imitation learning. The student policy learns to predict the teacher's optimal action given only partial and noisy observations of the environment.

Two different types of measurement noise are applied when sampling the heights:

Shifting scan points laterally.
Perturbing the height values.

Each noise value is sampled from a Gaussian distribution, and the noise parameter z defines the variance. Both types of noise are applied in three different scopes, all with their own noise variance:

per scan point
per foot
per episode

In addition, three mapping conditions are defined with associated noise parameters z to simulate changing map quality and error sources:

Nominal noise assuming good map quality during regular operation.
Large offsets through high per-foot noise to simulate map offsets due to pose estimation drift or deformable terrain.
Large noise magnitude for each scan point to simulate complete lack of terrain information due to occlusion or mapping failure.

These three mapping conditions are selected at the beginning of each training episode in a ratio of 60%, 30%, and 10%. For more implementation details, please refer to the supplementary section S8.

To evaluate the effect of the attention gate, we compare the performance of the belief encoder with and without the gate. Two student policies are trained using different belief encoders:

RNN-Attn: a student policy that incorporates the attention mechanism;
RNN: a student policy that does not use an attention gate.

In each figure, the purple curve represents the RNN-Attn policy, while the yellow line denotes the RNN policy.

Performance: Both RNN-Attn and RNN show a rapid increase initially and stabilize around 1000 steps after 100 iterations.

Losses: The attention gate significantly enhances the decoder's reconstruction capability by selectively regulating the passage of exteroceptive information. When the exteroceptive sensor is severely affected by environmental noise, the decoder becomes less dependent on this information, improving robustness.

To assess the effectiveness of the attention mechanism in reconstructing privileged exteroceptive information, we conduct a simulation experiment on a quadruped robot. Multiple videos are recorded for analysis, where dots underneath the robot represent either:

the privileged exteroceptive information, or
the raw and noisy exteroceptive information that the student policy can directly obtain from the LiDAR sensor, or
the reconstructed exteroceptive information.