Tech

UI-TARS GUI agent model can automate tasks such as finding and booking airline tickets

Share
Share
GUI Agent model UI-TARS can automate tasks such as finding and booking airline tickets
Overview of UI-TARS, illustrating the architecture of the model and its core capabilities. Credit: arXiv (2025). DOI: 10.48550/arxiv.2501.12326

A team of software engineers, AI specialists and programmers at Tsinghua University, working with TikTok parent company ByteDance, has announced the development of a graphical user interface (GUI) agent model called UI-TARS. The group announced its development and introduction to the world at large in a paper posted to the arXiv preprint server.

Over the past decade, AI applications have flourished. Some of the most well-known are LLMs such as ChatGPT. But others have been under development to serve a variety of purposes. One application is assisting computer users in carrying out mundane tasks, such as sourcing the cheapest airline fare for a flight between two cities and then buying tickets for it. Such tasks typically involve time-consuming web browsing.

AI researchers have suggested that such tasks could be automated by smart agents. In this new study, the team in China has done just that with the development of UI-TARS—a GUI agent model that can be used locally on a personal computer or via the cloud on other devices.

The model was trained using 50 billion tokens that represented characteristics of a GUI (via screenshots), such as those found on traditional web pages. Training also involved reflection tuning, which meant the model was programmed to learn from mistakes and then to adapt, modifying how it approached different or unknown situations.

When running UI-TARS, a user is presented with two tabs—one shows the “thinking process” that the app is undergoing as it goes about its overall task. The other tab shows the websites, files or other GUIs that the app is working with. Thus, if it was used to book a flight, a user could see the airline websites being viewed and could then switch over to see what the app was doing with them.

At the end of the process, the user is presented with the final web page prompting confirmation of ticket purchase. In testing their model, the team found that it outperformed other AI models such as GPT-4o, or Gemini-2.0.

More information:
Yujia Qin et al, UI-TARS: Pioneering Automated GUI Interaction with Native Agents, arXiv (2025). DOI: 10.48550/arxiv.2501.12326

UI-TARS: github.com/bytedance/UI-TARS

Journal information:
arXiv


© 2025 Science X Network

Citation:
UI-TARS GUI agent model can automate tasks such as finding and booking airline tickets (2025, January 23)
retrieved 23 January 2025
from

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Share

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles
Windows 11’s new Start menu falls short in one key area – and it’s making people angry
Tech

Windows 11’s new Start menu falls short in one key area – and it’s making people angry

Microsoft has a Start menu redesign in testing This introduces new layouts...

NordPass will now let you keep all your most vital files in a special secure vault
Tech

NordPass will now let you keep all your most vital files in a special secure vault

NordPass now supports attachments of your most important documents Set reminders to...

Marvel just confirmed my biggest fear about Ironheart’s release schedule on Disney+
Tech

Marvel just confirmed my biggest fear about Ironheart’s release schedule on Disney+

Marvel has revealed the full schedule and episode titles for Ironheart My...

The hidden bias pushing women out of computer science
Tech

The hidden bias pushing women out of computer science

Credit: Pixabay/CC0 Public Domain At the dawn of computing, women were the...