Tech

Text-to-video AI blossoms with new metamorphic video capabilities

Share
Share
Text-to-video AI blossoms with new metamorphic video capabilities
Overview of the proposed MagicTime approach. Credit: arXiv: DOI: 10.48550/arxiv.2404.05014

While text-to-video artificial intelligence models like OpenAI’s Sora are rapidly metamorphosing in front of our eyes, they have struggled to produce metamorphic videos. Simulating a tree sprouting or a flower blooming is harder for AI systems than generating other types of videos because it requires the knowledge of the physical world and can vary widely.

But now, these models have taken an evolutionary step.

Computer scientists at the University of Rochester, Peking University, University of California, Santa Cruz, and National University of Singapore developed a new AI text-to-video model that learns real-world physics knowledge from time-lapse videos. The team outlines their model, MagicTime, in a paper published in IEEE Transactions on Pattern Analysis and Machine Intelligence.







“MagicTime is a step toward AI that can better simulate the physical, chemical, biological, or social properties of the world around us,” says computer science Ph.D. student Jinfa Huang. Credit: University of Rochester GIF created using MagicTime

“Artificial intelligence has been developed to try to understand the real world and to simulate the activities and events that take place,” says Jinfa Huang, a Ph.D. student supervised by Professor Jiebo Luo from Rochester’s Department of Computer Science, both of whom are among the paper’s authors. “MagicTime is a step toward AI that can better simulate the physical, chemical, biological, or social properties of the world around us.”

Previous models generated videos that typically have limited motion and poor variations. To train AI models to more effectively mimic metamorphic processes, the researchers developed a high-quality dataset of more than 2,000 time-lapse videos with detailed captions.







“dough […] swells and browns in the oven […]” Credit: Shenghai Yuan et al

Currently, the open-source U-Net version of MagicTime generates two-second, 512-by-512-pixel clips (at 8 frames per second), and an accompanying diffusion-transformer architecture extends this to 10-second clips. The model can be used to simulate not only biological metamorphosis but also buildings undergoing construction or bread baking in the oven.

But while the videos generated are visually interesting and the demo can be fun to play with, the researchers view this as an important step toward more sophisticated models that could provide important tools for scientists.

“Our hope is that someday, for example, biologists could use generative video to speed up preliminary exploration of ideas,” says Huang. “While physical experiments remain indispensable for final verification, accurate simulations can shorten iteration cycles and reduce the number of live trials needed.”

More information:
Shenghai Yuan et al, MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators, IEEE Transactions on Pattern Analysis and Machine Intelligence (2025). DOI: 10.1109/TPAMI.2025.3558507. On arXiv: DOI: 10.48550/arxiv.2404.05014

Provided by
University of Rochester


Citation:
Text-to-video AI blossoms with new metamorphic video capabilities (2025, May 5)
retrieved 5 May 2025
from

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Share

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles
How AI can protect forests
Tech

How AI can protect forests

The proposed pipeline for change detection in high threat zones in forests....

Google’s Gemini AI Is now a Pokémon Master
Tech

Google’s Gemini AI Is now a Pokémon Master

Google’s Gemini 2.5 Pro has officially completed Pokémon Blue The game ran...

Physical cloaking works like a disappearing act for structural defects
Tech

Physical cloaking works like a disappearing act for structural defects

Researchers created microstructures to shield a defect shaped like a rabbit. Credit:...

A big data approach for next-generation battery electrolytes
Tech

A big data approach for next-generation battery electrolytes

Credit: Chemistry of Materials (2025). DOI: 10.1021/acs.chemmater.4c03196 Discovering new, powerful electrolytes is...