Sample dataset

This is what a dataset drop looks like. We deliver in native LeRobot v2.0 format with pre-computed normalization statistics—ready for VLA fine-tuning without any conversion.

LeRobot v2.0 format

We deliver in the exact structure that modern VLA models and LeRobot expect. No conversion scripts. No format wrangling. Point your training config at the dataset and run.

motionledger_franka_v1/
├── meta/
│   ├── info.json                    # Dataset metadata, features schema
│   ├── episodes.jsonl               # Episode boundaries and lengths
│   ├── tasks.jsonl                  # Task descriptions and indices
│   └── stats.json                   # Pre-computed normalization statistics
│
├── data/
│   ├── chunk-000/
│   │   ├── episode_000000.parquet   # State, action, timestamps per episode
│   │   ├── episode_000001.parquet
│   │   └── ...
│   └── chunk-001/
│       └── ...
│
├── videos/
│   ├── observation.images.base_0_rgb/
│   │   ├── episode_000000.mp4       # Third-person camera
│   │   ├── episode_000001.mp4
│   │   └── ...
│   ├── observation.images.left_wrist_0_rgb/
│   │   └── ...                      # Left wrist camera
│   └── observation.images.right_wrist_0_rgb/
│       └── ...                      # Right wrist camera (if bimanual)
│
└── assets/
    └── motionledger_franka_v1/
        └── norm_stats.json          # Pre-computed normalization stats

Why this matters: The LeRobot v2.0 format is what modern VLA architectures consume natively. Custom formats require conversion code that introduces bugs and wastes engineering time. We've seen teams spend weeks debugging format mismatches.

Dataset metadata

The meta/info.json file defines the schema. LeRobot reads this to understand your observation and action spaces.

meta/info.json
{
  "codebase_version": "v2.0",
  "robot_type": "franka",
  "total_episodes": 812,
  "total_frames": 487200,
  "fps": 50,
  "features": {
    "observation.state": {
      "dtype": "float32",
      "shape": [8],
      "names": [
        "joint_0", "joint_1", "joint_2", "joint_3",
        "joint_4", "joint_5", "joint_6", "gripper"
      ]
    },
    "action": {
      "dtype": "float32",
      "shape": [8],
      "names": [
        "joint_0", "joint_1", "joint_2", "joint_3",
        "joint_4", "joint_5", "joint_6", "gripper"
      ]
    },
    "observation.images.base_0_rgb": {
      "dtype": "video",
      "shape": [480, 640, 3],
      "video_info": {"fps": 30, "codec": "h264"}
    },
    "observation.images.left_wrist_0_rgb": {
      "dtype": "video",
      "shape": [480, 640, 3],
      "video_info": {"fps": 30, "codec": "h264"}
    }
  },
  "splits": {"train": "0:750", "val": "750:812"}
}

Normalization statistics

Every VLA training run requires pre-computed normalization statistics. Without these, you'd run compute_norm_stats.py yourself—a process that takes hours on large datasets.

We include these in every delivery. Your team loads the dataset and starts training immediately.

assets/motionledger_franka_v1/norm_stats.json
{
  "observation.state": {
    "mean": [-0.0012, 0.2847, -0.0034, -1.8721, 0.0089, 2.1043, 0.7821, 0.42],
    "std":  [0.1823, 0.2156, 0.1934, 0.3127, 0.1567, 0.2891, 0.1423, 0.31],
    "q01":  [-0.4521, -0.1823, -0.4912, -2.5123, -0.3821, 1.4521, 0.2312, 0.0],
    "q99":  [0.4498, 0.7521, 0.4834, -1.2341, 0.4012, 2.7621, 1.3412, 1.0],
    "min":  [-0.5123, -0.2341, -0.5621, -2.6234, -0.4523, 1.3234, 0.1823, 0.0],
    "max":  [0.5234, 0.8123, 0.5512, -1.1234, 0.4823, 2.8512, 1.4123, 1.0]
  },
  "action": {
    "mean": [0.0001, 0.0003, -0.0002, 0.0001, 0.0002, -0.0001, 0.0001, 0.48],
    "std":  [0.0234, 0.0312, 0.0289, 0.0198, 0.0267, 0.0234, 0.0178, 0.35],
    "q01":  [-0.0612, -0.0823, -0.0756, -0.0521, -0.0698, -0.0612, -0.0467, 0.0],
    "q99":  [0.0598, 0.0812, 0.0734, 0.0512, 0.0687, 0.0598, 0.0456, 1.0],
    "min":  [-0.0823, -0.1012, -0.0934, -0.0712, -0.0887, -0.0823, -0.0612, 0.0],
    "max":  [0.0812, 0.0998, 0.0912, 0.0698, 0.0876, 0.0812, 0.0598, 1.0]
  }
}

What each statistic is for

Field	Used by	Purpose
`mean`, `std`	Normalize transform	Z-score normalization: `(x - mean) / std`
`q01`, `q99`	NormalizeBounds transform	Robust scaling using 1st/99th percentiles (outlier-resistant)
`min`, `max`	MinMax transform	Scale to [0, 1] or [-1, 1] range

Action space conventions

VLA models expect a specific dimension ordering. Wrong ordering means your model predicts elbow angles when it should predict shoulder angles. We enforce the correct ordering in every delivery.

Single-arm (Franka, UR5, DROID)

8 dimensions
dim[0]: joint_0 (shoulder pan)
dim[1]: joint_1 (shoulder lift)
dim[2]: joint_2 (elbow)
dim[3]: joint_3 (wrist 1)
dim[4]: joint_4 (wrist 2)
dim[5]: joint_5 (wrist 3)
dim[6]: joint_6 (flange)
dim[7]: gripper [0=open, 1=closed]

Bimanual (ALOHA, Trossen)

14 dimensions
dim[0:5]:  left arm joints (6 DoF)
dim[6]:    left gripper [0=open, 1=closed]
dim[7:12]: right arm joints (6 DoF)
dim[13]:   right gripper [0=open, 1=closed]

Delta vs. absolute actions

Most VLA models are trained on delta actions—the change from current state, not absolute positions. This is applied during training via the DeltaActions transform:

Delta transform
# Applied during training:
action_delta = action_absolute - current_state

# Gripper remains absolute (not delta):
action_delta[gripper_idx] = action_absolute[gripper_idx]

We deliver absolute actions by default. The delta transform is applied at training time using the delta_indices mask in your training config.

Image key conventions

VLA models expect specific image key names. Using custom names like cam_top orwrist_camera requires writing transform code. We use the standard LeRobot names so your model loads the data without modification.

Camera position	Standard key name	Feature path
Third-person / overhead	`base_0_rgb`	`observation.images.base_0_rgb`
Left wrist	`left_wrist_0_rgb`	`observation.images.left_wrist_0_rgb`
Right wrist	`right_wrist_0_rgb`	`observation.images.right_wrist_0_rgb`
Single wrist (non-bimanual)	`wrist_0_rgb`	`observation.images.wrist_0_rgb`
Additional base camera	`base_1_rgb`	`observation.images.base_1_rgb`

Video format: H.264 encoded MP4, 30 FPS default (configurable to match your control frequency). Resolution preserved from source cameras. All frames are temporally aligned to the state/action timestamps.

Language instructions

High-quality datasets like DROID include multiple paraphrases per episode. This diversity is critical for language grounding—the model learns that "grab the cup" and "pick up the mug" mean the same action.

meta/tasks.jsonl
{"task_index": 0, "task": "Pick up the red block and place it in the bin"}
{"task_index": 1, "task": "Grab the crimson cube and put it in the container"}
{"task_index": 2, "task": "Lift the red object and drop it in the box"}
{"task_index": 3, "task": "Open the drawer"}
{"task_index": 4, "task": "Pull the drawer handle to open it"}
{"task_index": 5, "task": "Grasp the drawer and slide it out"}

During training, the loader randomly samples one instruction per episode from available paraphrases. We collect 3+ paraphrases per unique task by default.

Timing synchronization

Anyone can claim "millisecond sync." We show you the actual alignment error over time. This is a simulated 60-second episode demonstrating our timing QA pipeline.

Note: This episode is simulated to demonstrate our timing QA and drift correction pipeline. Real client data available under NDA.

Alignment error before correction

Raw camera timestamps vs host clock. Notice the jump in cam_left at ~23s.

Alignment error after piecewise correction

After detecting jumps and refitting per segment, residual error is within ±5ms (p95), worst-case under 10ms.

Detected timing events

Stream	Event	Time (s)	Magnitude	Action
cam_left	Clock jump	23.0s	+12.0ms	Split + refit
cam_front	Normal drift	—	15 ppm	Linear correction
cam_right	Normal drift	—	-10 ppm	Linear correction

Download raw timestamp data

Inspect the actual nanosecond timestamps yourself.

cam_front_frames.csv cam_left_frames.csv cam_right_frames.csv robot_state.csv robot_action.csv

QA report

Every drop includes a QA report. Full transparency about what was rejected and why.

Metric	Value	Threshold	Status
Total episodes collected	847	—	—
Accepted	812	—	—
Rejected	35	—	—
Rejection rate	4.1%	<10%	Pass
Avg frame drop rate	0.3%	<1%	Pass
Max timestamp jitter (p95)	4.8ms	<10ms	Pass
Max timestamp jitter (p99)	8.2ms	<20ms	Pass

Loading the dataset

Our datasets load directly into LeRobot. No custom loader required—use the standard API.

load_dataset.py
from lerobot.common.datasets.lerobot_dataset import LeRobotDataset

# Point to your local dataset or HuggingFace repo
dataset = LeRobotDataset(
    repo_id="motionledger/franka_pick_place_v1",  # or local path
    split="train",
)

# Access an episode
episode = dataset[0]

# Available keys
print(episode.keys())
# dict_keys([
#   'observation.state',              # [T, 8] joint positions + gripper
#   'action',                         # [T, 8] joint commands + gripper
#   'observation.images.base_0_rgb',  # [T, H, W, 3] third-person video
#   'observation.images.wrist_0_rgb', # [T, H, W, 3] wrist camera video
#   'timestamp',                      # [T] seconds from episode start
#   'task',                           # language instruction string
#   'episode_index',                  # int
#   'frame_index',                    # [T] int
# ])

# Normalization stats are loaded automatically
print(dataset.stats.keys())
# dict_keys(['observation.state', 'action'])

Supported platforms

We've collected data and validated our pipeline on these platforms. Custom configurations supported—send us your URDF.

Platform	Embodiment	Action dims	Status
Franka Emika Panda	Single-arm	8 (7 joints + gripper)	Active
ALOHA (Trossen ViperX)	Bimanual	14 (2×6 joints + 2 grippers)	Active
Universal Robots UR5e	Single-arm	7 (6 joints + gripper)	Active
DROID (various)	Single-arm	8 (7 DoF + gripper)	Active
Mobile ALOHA	Mobile bimanual	16 (14 arm + 2 base)	Q1 2025
Custom	Any	Configurable	Contact us

Get a sample pack

Send us your spec and we'll build a sample pack in your exact format. Includes 10-20 episodes you can load and train on immediately.

Request sample pack →View integration guide