PianoMime: Learning a Generalist, Dexterous Piano Player from Internet Demonstrations

CoRL 2024
1Technical University of Munich, 2Technical University of Darmstadt 3UC Berkeley

We train a generalist policy for controlling dexterous robot hands to play any songs,
using human pianist demonstration videos from internet.

Abstract

In this work, we introduce PianoMime, a framework for training a piano-playing agent using internet demonstrations.

The internet is a promising source of large-scale demonstrations for training our robot agents. In particular, for the case of piano-playing, Youtube is full of videos of professional pianists playing a wide myriad of songs.

In our work, we leverage these demonstrations to train a generalist piano-playing agent capable of playing any arbitrary song. Our framework is divided into three parts: a data preparation phase to extract the informative features from the Youtube videos, a policy learning phase to train song-specific expert policies from the demonstrations and a policy distillation phase to distil the policies into a single generalist agent. We explore different policy designs to represent the agent and evaluate the influence of the amount of training data on the agent's ability to generalize to novel songs not available in the dataset.

From YouTube To Mujoco

We aim to learn piano playing policies by imitating YouTube videos. Given a YouTube video of a human playing a song, we extract the human fingertip movement and the piano state and use this information to learn a reinforcement learning policy that imitates the human fingertip motion while guaranting the piano state is the one in the video.

Performance Visual Comparison

To illustrate the robot's performance, we present a set of videos in which you can visualize the robot performing under a policy learned by our method (Pianomime) and trained under the baseline (RoboPianist).

While PianoMime framework exploits YouTube videos to learn the robot's behaviors from the human movements, RoboPianist relay on fingering strategies to assign different fingers to different keys.

Left: RoboPianist (Baseline). Right: PianoMime (Ours)

Generalist Policy via Distillation

Given a set of YouTube videos, we learn multiple single song piano-playing experts via reinforcement learning. Then, we apply Behavioral Cloning to learn a single piano playing policy that is able to play any song.

Results of the generalist policy on training dataset.

The policy was tested on novel songs that were not available in the training dataset. We observed that the robot achieved an average of 0.57 F1 score in completely novel songs.

Results of the generalist policy on test dataset (unseen songs).