Tech

Surprisingly enough, it seems some AI agents aren’t quite up to scratch on some basic business tests

Share
Share


  • Salesforce research finds single-turn tasks see only 58% success, while multi-turn effectiveness drops to 35%
  • Reasoning models like gemini-2.5-pro tend to outperform lighter models
  • CRMArena-Pro has proven to be a challenging benchmark

Researchers from Salesforce AI Research have introduced a new benchmark – CRMArena-Pro – which uses synthetic enterprise data to access LLM agent performance in difference CRM scenarios.

It found LLM agents achieved around 58% success on tasks which can be completed in a single step, with tasks that require multiple interactions dropping in effectiveness to just 35% – barely more than one in three.

Share

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles
Why stablecoins are gaining popularity
Tech

Why stablecoins are gaining popularity

Tether is a stablecoin pegged to the dollar. Stablecoins—a form of cryptocurrency...

UK startup looks to cut shipping’s carbon emissions
Tech

UK startup looks to cut shipping’s carbon emissions

Alisha Fredriksson and Roujia Wen met at university and have since founded...

Spain says ‘overvoltage’ caused huge April blackout
Tech

Spain says ‘overvoltage’ caused huge April blackout

A nationwide power outage in Spain in April halted trains, shut businesses...