Tech

Smart glasses capture first-person task demos

Share
Share
A new system to collect action-labeled data for robot training using smart-glasses
Human demonstrations are done with only black ovens (top). The policy transfers zero-shot to the robot with the same oven (middle) and also generalizes to a new oven instance (bottom). The points are color-coded to represent the correspondence. Credit: Liu et al.

Over the past few decades, robots have gradually started making their way into various real-world settings, including some malls, airports and hospitals, as well as a few offices and households.

For robots to be deployed on a larger scale, serving as reliable everyday assistants, they should be able to complete a wide range of common manual tasks and chores, such as cleaning, washing the dishes, cooking and doing the laundry.

Training machine learning algorithms that allow robots to successfully complete these tasks can be challenging, as it often requires extensive annotated data and/or demonstration videos showing humans the tasks. Devising more effective methods to collect data to train robotics algorithms could thus be highly advantageous, as it could help to further broaden the capabilities of robots.

Researchers at New York University and UC Berkeley recently introduced EgoZero, a new system to collect ego-centric demonstrations of humans completing specific manual tasks. This system, introduced in a paper posted to the arXiv preprint server, relies on the use of Project Aria glasses, the smart glasses for augmented reality (AR) developed by Meta.







Credit:

“We believe that general-purpose robotics is bottlenecked by a lack of internet-scale data, and that the best way to address this problem would be to collect and learn from first-person human data,” Lerrel Pinto, senior author of the paper, told Tech Xplore.

“The primary objectives of this project were to develop a way to collect accurate action-labeled data for robot training, optimize for the ergonomics of the data collection wearables needed, and transfer human behaviors into robot policies with zero robot data.”

EgoZero, the new system developed by Pinto and his colleagues, relies on Project Aria smart glasses to easily collect video demonstrations of humans completing tasks while performing robot-executable actions, captured from the point of view of the person wearing the glasses.

These demonstrations can in turn be used to train robotics algorithms on new manipulation policies, which could in turn allow robots to successfully complete various manual tasks.

“Unlike prior works that require multiple calibrated cameras, wrist wearables, or motion capture gloves, EgoZero is unique in that it is able to extract these 3D representations with only smart glasses (Project Aria smart glasses),” explained Ademi Adeniji, student and co-lead author of the paper.

“As a result, robots can learn a new task from as little as 20 minutes of human demonstrations, with no teleoperation.”

A new system to collect action-labeled data for robot training using smart-glasses
Architecture diagram. EgoZero trains policies in a unified state-action space defined as egocentric 3D points. Unlike previous methods, EgoZero localizes object points via triangulation over the camera trajectory, and computes action points via Aria MPS hand pose and a hand estimation model. These points supervise a closed-loop Transformer policy, which is rolled out on unprojected points from an iPhone during inference. Credit: Liu et al.

To evaluate their proposed system, the researchers used it to collect video demonstrations of simple actions that are commonly completed in a household environment (e.g., opening an oven door) and then used these demonstrations to train a machine learning algorithm.

The machine learning algorithm was then deployed on Franka Panda, a robotic arm with a gripper attached at its end. Notably, they found that the robotic arm successfully completed most of the tasks they tested it on, even if the algorithm planning its movements underwent minimal training.

“EgoZero’s biggest contribution is that it can transfer human behaviors into robot policies with zero robot data, with just a pair of smart glasses,” said Pinto.

“It extends past work (Point Policy) by showing that 3D representations enable efficient robot learning from humans, but completely in-the-wild. We hope this serves as a foundation for future exploration of representations and algorithms to enable human-to-robot learning at scale.”

The code for the data collection system introduced by Pinto and his colleagues was published on GitHub and can be easily accessed by other research teams.

In the future, it could be used to rapidly collect datasets to train robotics algorithms, which could contribute to the further development of robots, ultimately facilitating their deployment in a greater number of households and offices worldwide.

“We now hope to explore the tradeoffs between 2D and 3D representations at a larger scale,” added Vincent Liu, student and co-lead author of the paper.

“EgoZero and past work (Point Policy, P3PO) have only explored single-task 3D policies, so it would be interesting to extend this framework of learning from 3D points in the form of a fine-tuned LLM/VLM, similar to how modern VLA models are trained.”

Written for you by our author Ingrid Fadelli, edited by Lisa Lock, and fact-checked and reviewed by Robert Egan—this article is the result of careful human work. We rely on readers like you to keep independent science journalism alive. If this reporting matters to you, please consider a donation (especially monthly). You’ll get an ad-free account as a thank-you.

More information:
Vincent Liu et al, EgoZero: Robot Learning from Smart Glasses, arXiv (2025). DOI: 10.48550/arxiv.2505.20290

Journal information:
arXiv


© 2025 Science X Network

Citation:
Training robots without robots: Smart glasses capture first-person task demos (2025, June 12)
retrieved 12 June 2025
from

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Share

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles
AI overviews have transformed Google search. Here’s how they work—and how to opt out
Tech

AI overviews have transformed Google search. Here’s how they work—and how to opt out

Credit: Pixabay/CC0 Public Domain People turn to the internet to run billions...

BlackBerry Classic returns in 2025 as Zinwa Q25 with updated hardware and software
Tech

BlackBerry Classic returns in 2025 as Zinwa Q25 with updated hardware and software

Zinwa Q25 revives BlackBerry Classic with modern hardware and software Original screen...

Workers need better tools and tech to boost productivity. Why aren’t companies stepping up to invest?
Tech

Workers need better tools and tech to boost productivity. Why aren’t companies stepping up to invest?

Credit: cottonbro studio from Pexels As Prime Minister Anthony Albanese and Treasurer...