Tech

UI-TARS GUI agent model can automate tasks such as finding and booking airline tickets

Share
Share
GUI Agent model UI-TARS can automate tasks such as finding and booking airline tickets
Overview of UI-TARS, illustrating the architecture of the model and its core capabilities. Credit: arXiv (2025). DOI: 10.48550/arxiv.2501.12326

A team of software engineers, AI specialists and programmers at Tsinghua University, working with TikTok parent company ByteDance, has announced the development of a graphical user interface (GUI) agent model called UI-TARS. The group announced its development and introduction to the world at large in a paper posted to the arXiv preprint server.

Over the past decade, AI applications have flourished. Some of the most well-known are LLMs such as ChatGPT. But others have been under development to serve a variety of purposes. One application is assisting computer users in carrying out mundane tasks, such as sourcing the cheapest airline fare for a flight between two cities and then buying tickets for it. Such tasks typically involve time-consuming web browsing.

AI researchers have suggested that such tasks could be automated by smart agents. In this new study, the team in China has done just that with the development of UI-TARS—a GUI agent model that can be used locally on a personal computer or via the cloud on other devices.

The model was trained using 50 billion tokens that represented characteristics of a GUI (via screenshots), such as those found on traditional web pages. Training also involved reflection tuning, which meant the model was programmed to learn from mistakes and then to adapt, modifying how it approached different or unknown situations.

When running UI-TARS, a user is presented with two tabs—one shows the “thinking process” that the app is undergoing as it goes about its overall task. The other tab shows the websites, files or other GUIs that the app is working with. Thus, if it was used to book a flight, a user could see the airline websites being viewed and could then switch over to see what the app was doing with them.

At the end of the process, the user is presented with the final web page prompting confirmation of ticket purchase. In testing their model, the team found that it outperformed other AI models such as GPT-4o, or Gemini-2.0.

More information:
Yujia Qin et al, UI-TARS: Pioneering Automated GUI Interaction with Native Agents, arXiv (2025). DOI: 10.48550/arxiv.2501.12326

UI-TARS: github.com/bytedance/UI-TARS

Journal information:
arXiv


© 2025 Science X Network

Citation:
UI-TARS GUI agent model can automate tasks such as finding and booking airline tickets (2025, January 23)
retrieved 23 January 2025
from

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Share

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles
We just got another big hint that the Samsung Galaxy S25 FE is on the way
Tech

We just got another big hint that the Samsung Galaxy S25 FE is on the way

References to Galaxy S25 FE firmware have appeared The phone could launch...

You won’t believe what 700+ projectors and AI can do in Abu Dhabi’s new immersive art world
Tech

You won’t believe what 700+ projectors and AI can do in Abu Dhabi’s new immersive art world

Over 700 Epson projectors transform walls into moving, responsive works of living...

When the school bell rings, the bandwidth drops: How post-15:40 internet surges affect UK broadband quality
Tech

When the school bell rings, the bandwidth drops: How post-15:40 internet surges affect UK broadband quality

Half of parents work after school, causing a broadband battle with streaming-addicted...