Diffusion Policy#
Installation#
cd roboverse_learn/algorithms/diffusion_policy
pip install -e .
cd ../../../
pip install pandas wandb
Training Procedure#
The main script for training is train.sh, which automates both data preparation and training.
Usage of train.sh#
bash roboverse_learn/algorithms/diffusion_policy/train.sh <task_name> <robot> <expert_data_num> <level> <seed> <gpu_id> <DEBUG> <num_epochs> <obs_space> <act_space> [<delta_ee>]
Argument |
Description |
|---|---|
|
Name of the task |
|
Robot type used for training |
|
Number of expert demonstrations that were collected |
|
Randomization level of the demonstrations |
|
Random seed for reproducibility |
|
ID of the GPU to use |
|
Debug mode toggle ( |
|
Number of training epochs |
|
Observation space ( |
|
Action space ( |
|
Optional: Delta control ( |
Example:
bash roboverse_learn/algorithms/diffusion_policy/train.sh CloseBox franka 100 0 42 0 False 200 ee ee 0
This script runs in two parts:
Data Preparation: data2zarr_dp.py Converts the metadata into Zarr format for efficient dataloading. Automatically parses arguments and points to the correct
metadata_dir(the location of data collected by thecollect_demoscript) to convert demonstration data into Zarr format.Training: diffusion_policy/train.py Uses the generated Zarr data, which gets stored in the
data_policy/directory, to train the diffusion policy model.
We chose to combine these two parts for consistency of action-space and observation-space data processing, but these two parts can be ran independently if desired.
Understanding data2zarr_dp.py#
The data2zarr_dp.py script converts demonstration data into Zarr format for efficient data loading. While train.sh handles this automatically, you may want to run this step separately for custom data preprocessing.
python roboverse_learn/algorithms/diffusion_policy/data2zarr_dp.py [arguments]
Key Arguments:
Argument |
Description |
|---|---|
|
Name of the task (e.g., StackCube_franka) |
|
Number of episodes to process |
|
Path to the demonstration metadata |
|
Downsample ratio for demonstration data |
|
Custom name for the output Zarr file |
|
Observation space to use ( |
|
Action space to use ( |
|
(optional) Delta control mode for end effector (0: absolute, 1: delta) |
|
(optional) If > 0, pad joint positions to this length when using |
The processed data is saved to data_policy/[task_name]_[expert_data_num]_[custom_name].zarr and is ready for training. This script also saves a metadata.json which contains some of the above parameters so that the downstream policy training can see how the data is processed.
Important Parameter Overrides:
horizon,n_obs_steps, andn_action_stepsare set directly intrain.shand override the YAML configurations.All other parameters (e.g., batch size, number of epochs) can be manually adjusted in the YAML file:
roboverse_learn/algorithms/diffusion_policy/diffusion_policy/config/robot_dp.yamlIf you alter observation and action spaces, verify the corresponding shapes in:
roboverse_learn/algorithms/diffusion_policy/diffusion_policy/config/task/default_task.yaml
Switching between Joint Position and End Effector Control#
Joint Position Control: Set both
obs_spaceandact_spacetojoint_pos.End Effector Control: Set both
obs_spaceandact_spacetoee. You may usedelta_ee=1for delta mode ordelta_ee=0for absolute positioning.
Adjust relevant configuration parameters in:
roboverse_learn/algorithms/diffusion_policy/diffusion_policy/config/robot_dp.yaml
Evaluation#
To deploy and evaluate the trained policy:
python roboverse_learn/eval.py --task CloseBox --algo diffusion_policy --num_envs <up to ~50 envs works on RTX> --checkpoint_path <absolute_checkpoint_path>
Ensure that <absolute_checkpoint_path> points to your trained model checkpoint.