Integration guide
From dataset delivery to first training run. This guide covers LeRobot-based VLA training and custom pipelines.
Quick start
MotionLedger datasets are delivered in native LeRobot v2.0 format. These are the only steps you need to start training.
Download and extract
We deliver via S3, GCS, or direct download. Extract to your data directory.
# Download from your delivery bucket aws s3 sync s3://motionledger-delivery/your_org/franka_v1 ./data/motionledger_franka_v1 # Or extract from archive tar -xzf motionledger_franka_v1.tar.gz -C ./data/
Set the dataset path
Point LeRobot to your local datasets directory. This tells the loader where to find your data.
# Set the local datasets root export HF_LEROBOT_HOME=$(pwd)/data # Verify the dataset is found ls $HF_LEROBOT_HOME/motionledger_franka_v1/meta/info.json
Create training config
Add a config for your dataset. We include a template config with every delivery.
# Dataset configuration for MotionLedger Franka data dataset: repo_id: "motionledger_franka_v1" # Matches folder name in HF_LEROBOT_HOME split: "train" # Model expects these image keys (we deliver with correct naming) image_keys: - "observation.images.base_0_rgb" - "observation.images.wrist_0_rgb" # State/action configuration state_key: "observation.state" action_key: "action" # Delta action transform (applied at training time) # Gripper (last dim) stays absolute, joints are delta delta_indices: [0, 1, 2, 3, 4, 5, 6] # indices to convert to delta # Normalization (uses pre-computed stats from dataset) normalize_state: true normalize_action: true normalization_mode: "bounds" # Uses q01/q99 for robust scaling # Training params batch_size: 32 learning_rate: 1e-4 num_epochs: 100
Start training
Run the training script. Normalization stats are loaded automatically from the dataset.
# LeRobot training python lerobot/scripts/train.py \ policy=diffusion \ dataset.repo_id=motionledger_franka_v1 \ training.exp_name=franka_finetune_v1 # Or with your custom training script python train.py --config configs/motionledger_franka.yaml
Verify the dataset
Before training, verify the dataset loads correctly and inspect a few episodes.
from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
import numpy as np
# Load dataset
dataset = LeRobotDataset(
repo_id="motionledger_franka_v1",
split="train"
)
# Basic info
print(f"Total episodes: {dataset.num_episodes}")
print(f"Total frames: {len(dataset)}")
print(f"FPS: {dataset.fps}")
# Check features
print(f"\nFeatures: {list(dataset.features.keys())}")
# Sample an episode
sample = dataset[0]
print(f"\nSample keys: {list(sample.keys())}")
print(f"State shape: {sample['observation.state'].shape}")
print(f"Action shape: {sample['action'].shape}")
# Verify normalization stats exist
print(f"\nNorm stats available: {list(dataset.stats.keys())}")
print(f"State mean: {dataset.stats['observation.state']['mean'][:4]}...")
print(f"Action std: {dataset.stats['action']['std'][:4]}...")
# Check image dimensions
for key in sample.keys():
if 'images' in key:
print(f"{key} shape: {sample[key].shape}")Expected output
Total episodes: 812
Total frames: 487200
FPS: 50
Features: ['observation.state', 'action', 'observation.images.base_0_rgb',
'observation.images.wrist_0_rgb', 'timestamp', 'frame_index',
'episode_index', 'task_index']
Sample keys: ['observation.state', 'action', 'observation.images.base_0_rgb', ...]
State shape: torch.Size([600, 8])
Action shape: torch.Size([600, 8])
Norm stats available: ['observation.state', 'action']
State mean: [-0.0012, 0.2847, -0.0034, -1.8721]...
Action std: [0.0234, 0.0312, 0.0289, 0.0198]...
observation.images.base_0_rgb shape: torch.Size([600, 480, 640, 3])
observation.images.wrist_0_rgb shape: torch.Size([600, 480, 640, 3])VLA model fine-tuning
Here's a complete configuration for fine-tuning vision-language-action models on MotionLedger data. This works with diffusion policies, action-chunking transformers, and other LeRobot-compatible architectures.
"""
VLA fine-tuning config for MotionLedger dataset.
This config is included with your dataset delivery.
"""
from dataclasses import dataclass
@dataclass
class MotionLedgerConfig:
# Dataset
dataset_name: str = "motionledger_franka_v1"
dataset_type: str = "lerobot"
# Robot configuration
robot_type: str = "franka"
action_dim: int = 8 # 7 joints + 1 gripper
state_dim: int = 8 # Same as action
# Image inputs (must match dataset feature names exactly)
image_keys: tuple = (
"observation.images.base_0_rgb",
"observation.images.wrist_0_rgb",
)
image_size: tuple = (224, 224) # Resized for model input
# Action space
action_key: str = "action"
state_key: str = "observation.state"
# Delta action configuration
# Most VLA models expect delta actions for joints, absolute for gripper
use_delta_actions: bool = True
delta_action_mask: tuple = (True, True, True, True, True, True, True, False)
# [j0, j1, j2, j3, j4, j5, j6, grip]
# Normalization
# Uses pre-computed stats from dataset
normalize_actions: bool = True
normalize_states: bool = True
normalization_type: str = "bounds" # q01/q99 robust scaling
# Action chunking (for transformer-based policies)
action_horizon: int = 16 # Predict 16 future actions
prediction_horizon: int = 8 # But only execute 8
# Training
batch_size: int = 32
learning_rate: float = 1e-4
weight_decay: float = 0.01
warmup_steps: int = 1000
max_steps: int = 100000
gradient_clip: float = 1.0
mixed_precision: bool = True
# Checkpointing
save_every_n_steps: int = 5000
eval_every_n_steps: int = 1000
# Language conditioning
use_language: bool = True
language_key: str = "task" # Maps to tasks.jsonlTraining command
# Set dataset path export HF_LEROBOT_HOME=/path/to/data # LeRobot diffusion policy python lerobot/scripts/train.py \ policy=diffusion \ dataset.repo_id=motionledger_franka_v1 \ training.exp_name=franka_finetune_v1 # With multi-GPU torchrun --nproc_per_node=4 lerobot/scripts/train.py \ policy=diffusion \ dataset.repo_id=motionledger_franka_v1 \ training.exp_name=franka_finetune_v1
Custom training pipelines
If you're using a custom training loop or a different model architecture, here's how to load the data directly.
import torch
from torch.utils.data import DataLoader
from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
# Load dataset
dataset = LeRobotDataset(
repo_id="motionledger_franka_v1",
split="train"
)
# Access normalization statistics
stats = dataset.stats
state_mean = torch.tensor(stats['observation.state']['mean'])
state_std = torch.tensor(stats['observation.state']['std'])
action_q01 = torch.tensor(stats['action']['q01'])
action_q99 = torch.tensor(stats['action']['q99'])
# Define transforms
def normalize_state(state):
return (state - state_mean) / state_std
def normalize_action_bounds(action):
# Robust normalization using percentiles
return 2 * (action - action_q01) / (action_q99 - action_q01) - 1
def compute_delta_action(action, state, delta_mask):
"""Convert absolute action to delta (relative to state)."""
delta = action.clone()
delta[..., delta_mask] = action[..., delta_mask] - state[..., delta_mask]
return delta
# Create DataLoader with transforms
def collate_fn(batch):
states = torch.stack([normalize_state(b['observation.state']) for b in batch])
actions = torch.stack([b['action'] for b in batch])
# Convert to delta
delta_mask = [True, True, True, True, True, True, True, False] # Joints only
delta_actions = compute_delta_action(actions, states, delta_mask)
# Normalize
delta_actions = normalize_action_bounds(delta_actions)
# Process images
images = {}
for key in ['observation.images.base_0_rgb', 'observation.images.wrist_0_rgb']:
if key in batch[0]:
imgs = torch.stack([b[key] for b in batch])
imgs = imgs.float() / 255.0 # Normalize to [0, 1]
images[key] = imgs
return {
'state': states,
'action': delta_actions,
'images': images,
'task': [b.get('task', '') for b in batch],
}
loader = DataLoader(
dataset,
batch_size=32,
shuffle=True,
num_workers=4,
collate_fn=collate_fn,
)
# Training loop
for batch in loader:
state = batch['state'] # [B, T, 8]
action = batch['action'] # [B, T, 8]
images = batch['images'] # dict of [B, T, H, W, 3]
task = batch['task'] # list of str
# Your model forward pass here
# ...Troubleshooting
Common issues and their solutions.
Dataset not found
FileNotFoundError: Could not find dataset motionledger_franka_v1
# Ensure HF_LEROBOT_HOME points to the parent directory export HF_LEROBOT_HOME=/path/to/data # Directory structure should be: # /path/to/data/ # └── motionledger_franka_v1/ # ├── meta/ # ├── data/ # └── videos/
Image key mismatch
KeyError: 'observation.images.cam_top'
Your model config uses different image key names than the dataset. Check the actual keys:
from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
dataset = LeRobotDataset("motionledger_franka_v1")
print([k for k in dataset.features.keys() if 'images' in k])
# ['observation.images.base_0_rgb', 'observation.images.wrist_0_rgb']
# Update your config to use these exact namesNormalization stats missing
KeyError: 'observation.state' not in stats
Verify the stats file exists and is accessible:
# Check stats file cat data/motionledger_franka_v1/meta/stats.json # If missing, contact us—every delivery should include this file # As a workaround, compute manually: uv run scripts/compute_norm_stats.py \ --repo-id motionledger_franka_v1 \ --output-dir data/motionledger_franka_v1/meta/
Action dimension mismatch
RuntimeError: shape mismatch, expected [14] got [8]
The model was trained for a different robot (e.g., bimanual vs single-arm). Check the action dimensions:
# Check dataset action dimensions
import json
with open('data/motionledger_franka_v1/meta/info.json') as f:
info = json.load(f)
print(info['features']['action']['shape']) # [8]
# Update your model config to match
config.action_dim = 8Video decoding slow
Training is bottlenecked by video loading.
Use more DataLoader workers and enable video caching:
# Increase workers loader = DataLoader(dataset, num_workers=8, prefetch_factor=4) # Or decode videos to frames offline (one-time cost) python scripts/decode_videos.py \ --input data/motionledger_franka_v1 \ --output data/motionledger_franka_v1_frames
What's included in every delivery
We don't just deliver data—we deliver training-ready packages.
Ready to start training?
Send us your spec and we'll deliver data you can train on the same day.