multimodal

8 Articles
New metric tracks where multimodal reasoning models go wrong
Tech

New metric tracks where multimodal reasoning models go wrong

(a) Example of outputs from a reasoning model and a non-reasoning model on a perception task. Red highlights indicate visual hallucination. Multimodal reasoning...

Multi-modal AI agent mimics human thinking for long video analysis and reasoning
Tech

Multi-modal AI agent mimics human thinking for long video analysis and reasoning

Credit: GitHub: While Artificial Intelligence (AI) technology is evolving rapidly, AI models still struggle with understanding long videos. A research team from The...

A novel, multimodal approach to automated speaking skill assessment
Tech

A novel, multimodal approach to automated speaking skill assessment

A proposed framework for simultaneously estimating multifaceted English communication skills. Previously developed systems for the automated assessment of speaking proficiency focus on limited...

New multimodal AI tool supports ecological applications
Tech

New multimodal AI tool supports ecological applications

The TaxaBind framework creates a unified database by distilling information from five different modalities into one binding modality. In TaxaBind’s case, the binding...

Psychology-based tasks assess multi-modal LLM visual cognition limits
Tech

Psychology-based tasks assess multi-modal LLM visual cognition limits

The help or hinder task; one of the tasks used to test the visual cognition of multimodal LLMs. Credit: MIT. Over the past...

A Minecraft-based benchmark to train and test multi-modal multi-agent systems
Tech

A Minecraft-based benchmark to train and test multi-modal multi-agent systems

More than 30 target objects or resources are used in TeamCraft tasks. Credit: UCLA. Researchers at the University of California- Los Angeles (UCLA)...

Open-source framework goes beyond language to enhance multimodal AI training capabilities
Tech

Open-source framework goes beyond language to enhance multimodal AI training capabilities

A couple of oranges seen through the lens of multiple modalities, with each slice showing a different way one might perceive and understand...

Integrated multi-modal sensing and learning system could give robots new capabilities
Tech

Integrated multi-modal sensing and learning system could give robots new capabilities

Soft robot fingers equipped with tactile sensors grasping an egg. The bottom-right images show the tactile sensing results. Credit: Binghao Huang. To assist...