Tech

Nvidia’s closest rival once again obliterates cloud giants in AI performance; Cerebras Inference is 75x faster than AWS, 32x faster than Google on Llama 3.1 405B

Share
Share

  • Cerebras hits 969 tokens/second on Llama 3.1 405B, 75x faster than AWS
  • Claims industry-low 240ms latency, twice as fast as Google Vertex
  • Cerebras Inference runs on the CS-3 with the WSE-3 AI processor

Cerebras Systems says it has set a new benchmark in AI performance with Meta’s Llama 3.1 405B model, achieving an unprecedented generation speed of 969 tokens per second.

Third-party benchmark firm Artificial Analysis has claimed this performance is up to 75 times faster than GPU-based offerings from major hyperscalers. It was nearly six times faster than SambaNova at 164 tokens per second, more than 14 times faster than Google Vertex at 30 tokens per second, and far surpassing Azure at just 20 tokens per second and AWS at 13 tokens per second.

Share

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles
Intel’s Core Ultra 9 and RTX 5060 Ti in one box? Lenovo’s wild mini PC pulls it off
Tech

Intel’s Core Ultra 9 and RTX 5060 Ti in one box? Lenovo’s wild mini PC pulls it off

Lenovo ThinkCentre neo Ultra 2025 squeezes high-end AI hardware into a tiny,...

10 Lego cars just raced the F1 Miami Grand Prix track – here’s how they were built
Tech

10 Lego cars just raced the F1 Miami Grand Prix track – here’s how they were built

10 Lego cars just drove around Miami’s F1 track They’re each built...

AI is booming, but most CFOs say they still can’t make money from it
Tech

AI is booming, but most CFOs say they still can’t make money from it

Most CFOs say they still can’t make money from AI yet Traditional...