Tech

Nvidia’s closest rival once again obliterates cloud giants in AI performance; Cerebras Inference is 75x faster than AWS, 32x faster than Google on Llama 3.1 405B

Share
Share

  • Cerebras hits 969 tokens/second on Llama 3.1 405B, 75x faster than AWS
  • Claims industry-low 240ms latency, twice as fast as Google Vertex
  • Cerebras Inference runs on the CS-3 with the WSE-3 AI processor

Cerebras Systems says it has set a new benchmark in AI performance with Meta’s Llama 3.1 405B model, achieving an unprecedented generation speed of 969 tokens per second.

Third-party benchmark firm Artificial Analysis has claimed this performance is up to 75 times faster than GPU-based offerings from major hyperscalers. It was nearly six times faster than SambaNova at 164 tokens per second, more than 14 times faster than Google Vertex at 30 tokens per second, and far surpassing Azure at just 20 tokens per second and AWS at 13 tokens per second.

Share

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles
New test can help driverless cars make ‘moral’ decisions
Tech

New test can help driverless cars make ‘moral’ decisions

Credit: Samuele Errico Piccarini. Researchers have validated a technique for studying how...

Here’s how to easily fix your broken Nintendo Switch 2 battery indicator
Tech

Here’s how to easily fix your broken Nintendo Switch 2 battery indicator

Nintendo has provided instructions for how to fix the Switch 2 battery...

Major US healthcare data provider hit by data breach – over 5 million patients affected, here’s what we know
Tech

Major US healthcare data provider hit by data breach – over 5 million patients affected, here’s what we know

Episource confirms cyberattack with patient data stolen The theft happened in late...

Customizable soft robot modules allow for new haptic interactions
Tech

Customizable soft robot modules allow for new haptic interactions

One possible configuration of the TangiBall. Credit: RRL EPFL/CC BY SA 4.0...