
Cornell researchers have developed an AI-powered process that automatically transforms a short video of a room into an interactive, 3D simulation of the space.
Inside this highly accurate “digital twin,” users can open drawers and cabinets and handle objects on the countertop. The technology can be used to develop more realistic video games and virtually train robots to operate within a specific real-world space—essentially any application that needs a realistic, interactive model of a room.
“Existing techniques, although they allow you to synthesize what the world looks like from different viewpoints, sometimes lack this capability of being immersive, where you can really interact with the scene,” said Wei-Chiu Ma, assistant professor of computer science in the Cornell Ann S. Bowers College of Computing and Information Science, and senior researcher on the project. “Because of the advances in generative AI techniques, we finally have enough tools to make a baby step toward creating digital twins that are now interactable.”
His collaborators include Hongchi Xia, a Ph.D. student in computer science at the University of Illinois Urbana-Champaign. Xia presented their project, “DRAWER: Digital Reconstruction and Articulation With Environment Realism,” on June 15 at the IEEE/CVF Conference on Computer Vision and Pattern Recognition in Nashville, Tennessee.
The process of creating a digital twin of a room using DRAWER starts with just a few minutes of filming.
“Our input is just a video that you casually capture in the kitchen. You don’t need to interact with any cabinet doors or with the objects,” Xia said. “I just hold my iPhone—you don’t need an advanced video device or expensive camera.”
To turn that video into a digital room that is both photorealistic and interactive, the researchers put together multiple AI models. They combined two methods for rendering digital images: one that looks attractive, and a second that recreates the scene with highly accurate dimensions. They also added a perception module, which determines which parts of the scene are mobile and how they should move, such as how a refrigerator door should swing open. Finally, they included a model that fills in the unseen insides of the drawers.
However, developing DRAWER wasn’t as simple as just linking up the modules, Xia said. He had to integrate them into a unified framework. Once completed, he used the method to develop recreations of a kitchen, a bathroom and even his office.
The digital twins generated by this approach work seamlessly with the game engines used to create video games, Xia said. The research team demonstrated this by creating a game where the user shoots balls to knock over objects in the kitchen, like the kettle and soap bottle.
The framework can also be applied to virtually train robots to operate in real-world environments through a process called real-to-sim-to-real transfer. The researchers virtually trained a robotic arm on the digital twin of the kitchen and then showed it successfully put away objects in the drawer in the real world.
They envision that in the near future, someone could order a robot, upload a video of their house and the digital twin of the house could be used to train the robot to function within the space before it’s even out of the box. The simulation is a cheaper, faster and safer way to train a robot, Ma said.
Currently, DRAWER only works with rigid objects, like a kettle, but eventually they plan to include soft or deformable objects, like cloth or windows that can break.
Additionally, DRAWER currently recreates a single room, but Ma and Xia hope to extend this work to encompass entire buildings. They also envision creating digital twins of outdoor spaces where the technology could be used for designing cities or optimizing agricultural yields.
“Our final goal is to try to build a digital twin of everything in the world,” said Xia, “so there are a lot of things that we can explore in the future.”
Additional authors on the study include colleagues from the University of Washington, including Entong Su, Marius Memmel, Arhan Jain, Raymond Yu, Numfor Mbiziwo-Tiapo, Ali Farhadi (also at the Allen Institute for Artificial Intelligence) and Abhishek Gupta, as well as Shenlong Wang from the University of Illinois Urbana-Champaign.
More information:
Paper: DRAWER: Digital Reconstruction and Articulation With Environment Realism
Citation:
Creating a 3D interactive digital room from simple video (2025, June 30)
retrieved 30 June 2025
from
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.
Leave a comment