Tech

Computer vision identifies images with a classification tree, including broad and specific categories

LovabledanielsMay 14, 202578 Views

Computer vision identifies images with a classification tree, including broad and specific categories — The new computer vision model, H-CAST, aligns coarse and fine grained classifiers using intra-image segmentation. Previous models treat fine and coarse levels as separate tasks, leading to mistakes where the fine classifier predicts a bird species while the coarse classifier predicts “plant.” Credit: Park et al., 2025

A new AI model, H-CAST, groups fine details into object-level concepts as attention moves from lower to high layers, outputting a classification tree—such as bird, eagle, bald eagle—rather than focusing only on fine-grained recognition.

The research was presented at the International Conference on Learning Representations in Singapore and builds upon the team’s prior model, CAST—the counterpart for visually grounded single-level classification. The paper is also published on the arXiv preprint server.

While some argue that deep learning can reliably provide fine-grained classification and infer broader categories, this tactic only works with clear images.

“Real-world applications involve plenty of imperfect images. If a model only focuses on fine-grained classification, it gives up before it even starts on images that don’t have enough information to support that level of detail,” said Stella Yu, a professor of computer science and engineering at U-M and contributing author of the study.

Hierarchical classification overcomes this issue, providing classification at multiple levels of detail for the same image. However, up to this point, hierarchical models have struggled with inconsistencies that come with treating each level as its own classification task.

For example, when identifying a bird, fine-grained classification often depends on local details like beak shape or feather color, while coarse labels require global features like overall shape. When these two levels are disconnected, it can result in a fine classifier predicting “green parakeet” while the coarse classifier predicts “plant.”

The new model instead focuses all levels on the same object at different levels of detail by aligning fine-to-coarse predictions through intra-image segmentation.

Previous hierarchical models trained from coarse to specific, focusing on the logic of semantic labeling which flows from general to specific (e.g., bird, hummingbird, green hermit). H-CAST instead trains in the visual space where recognition begins with fine details like beaks and wings that are composed of coarser structures, leading to better alignment and accuracy.

“Most prior work in hierarchical classification focused on semantics alone, but we found that consistent visual grounding across levels can make a huge difference. By encouraging models to ‘see’ the hierarchy in a visually coherent way, we hope this work inspires a shift toward more integrated and interpretable recognition systems,” said Seulki Park, a postdoctoral research fellow of computer science and engineering at the University of Michigan and lead author of the study.

Unlike prior methods, the research team leveraged unsupervised segmentation—typically used for identifying structures within a larger image—to support hierarchical classification. They demonstrate that its visual grouping mechanism can be effectively applied to classification without requiring pixel-level labels and helps improve segmentation quality.

To demonstrate the new model’s effectiveness, H-CAST was tested on four benchmark datasets and compared against hierarchical (FGN, HRN. TransHP, Hier-ViT) and baseline models (ViT, CAST, HiE).

“Our model outperformed zero-shot CLIP and state-of-the-art baselines on hierarchical classification benchmarks, achieving both higher accuracy and more consistent predictions,” said Yu.

For instance, in the BREEDS dataset, H-CAST’s full-path accuracy was 6% higher than previous state-of-the-art and 11% higher than baselines.

Feature-level nearest neighbor analysis also shows H-CAST retrieves semantically and visually consistent samples across hierarchy levels—unlike prior models that often retrieve visually similar but semantically incorrect samples.

This work could potentially be applied to any situation that requires an understanding of multi-level images. It could particularly benefit wildlife monitoring, identifying species where possible but falling back on coarser predictions. H-CAST can also help autonomous vehicles interpret imperfect visual input like occluded pedestrians or distant vehicles, helping the system make safe, approximate decisions at coarser levels of detail.

“Humans naturally fall back on coarser concepts. If I can’t tell if an image is of a Pembroke Corgi, I can still confidently say it’s a dog. But models often fail at that kind of flexible reasoning. We hope to eventually build a system that can adapt its prediction level just like we do,” said Park.

H-CAST was trained and tested using ARC High Performance Computing at U-M.

UC Berkeley, MIT and Scaled Foundations also contributed to this research.

More information:
Seulki Park, et al. Visually consistent hierarchical image classification. International Conference on Learning Representations (2025).

Seulki Park et al, Visually Consistent Hierarchical Image Classification, arXiv (2024). DOI: 10.48550/arxiv.2406.11608

Journal information:
arXiv

Provided by
University of Michigan College of Engineering

Citation:
Computer vision identifies images with a classification tree, including broad and specific categories (2025, May 14)
retrieved 14 May 2025
from

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Previous post In some Jewish families, speaking up for Palestine stirs discontent | Israel-Palestine conflict

Next post Loatinover Pounds Shuts Down Deluxe Hopes For 'Pray 4 Pitori'

Watch out – this fake UPS text scam is trying to trick you into handing over your personal data

A new scam text is impersonating UPS and warning people of missed...

Lovabledaniels

Tech

Even Donald Trump can’t get a good connection for a work video call

Trump says this was the second time AT&T was responsible for a...

Lovabledaniels

Tech

Nvidia is handing out Adobe Creative Cloud apps for free – but there’s more than one big catch

Nvidia has announced a sweet deal for anyone with an Nvidia Geforce...

Lovabledaniels

Tech

Assassin’s Creed: Black Flag remake rumors reignite as the game’s star hints that players ‘might have to beat it again’

The star of Assassin’s Creed Black Flag has hinted at a remake...

Lovabledaniels

Weekly update

Donald Trump signs order to lift Syria sanctions | Syria’s War

Thousands of babies face starvation in Gaza as milk supplies run dry | Israel-Palestine conflict

If you’re nostalgic for classic Guitar Hero and Rock Band games, you’ll probably want to keep an eye on this modular guitar controller

Weekly Newsletter

Computer vision identifies images with a classification tree, including broad and specific categories

Leave a comment

Leave a Reply Cancel reply

Explore more

Thousands of babies face starvation in Gaza as milk supplies run dry | Israel-Palestine conflict

If you’re nostalgic for classic Guitar Hero and Rock Band games, you’ll probably want to keep an eye on this modular guitar controller

Dyson’s next-gen Airwrap is smaller, lighter, and more powerful, but it’s the new straightening attachment that really turned my head

Good news Virgin Media O2 customers – huge Vodafone spectrum deal could mean major network boost

Watch out – this fake UPS text scam is trying to trick you into handing over your personal data

Even Donald Trump can’t get a good connection for a work video call

Nvidia is handing out Adobe Creative Cloud apps for free – but there’s more than one big catch

Assassin’s Creed: Black Flag remake rumors reignite as the game’s star hints that players ‘might have to beat it again’

Get to Know Us

Let's keep in touch