Tech

Surprisingly enough, it seems some AI agents aren’t quite up to scratch on some basic business tests

Share
Share


  • Salesforce research finds single-turn tasks see only 58% success, while multi-turn effectiveness drops to 35%
  • Reasoning models like gemini-2.5-pro tend to outperform lighter models
  • CRMArena-Pro has proven to be a challenging benchmark

Researchers from Salesforce AI Research have introduced a new benchmark – CRMArena-Pro – which uses synthetic enterprise data to access LLM agent performance in difference CRM scenarios.

It found LLM agents achieved around 58% success on tasks which can be completed in a single step, with tasks that require multiple interactions dropping in effectiveness to just 35% – barely more than one in three.

Share

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles
Microsoft 365 launches an on-premise edition that wants to solve productivity issues for good
Tech

Microsoft 365 launches an on-premise edition that wants to solve productivity issues for good

Microsoft 365 Local is an entirely on-prem solution for data sovereignty requirements...

Baltimore lawyer sues Meta, Google over online ‘squatter house’ networks
Tech

Baltimore lawyer sues Meta, Google over online ‘squatter house’ networks

Credit: Pixabay/CC0 Public Domain In his second lawsuit targeting social media giants,...

California’s ‘No Robo Bosses Act’ advances, taking aim at AI in the workplace
Tech

California’s ‘No Robo Bosses Act’ advances, taking aim at AI in the workplace

Credit: Unsplash/CC0 Public Domain One company offers Bay Area employers artificial intelligence...

Intel set for huge factory job cuts as it makes a major policy shift
Tech

Intel set for huge factory job cuts as it makes a major policy shift

Intel reportedly planning to cut 15-20% of factory workers next month News...