Inference

8 Articles
Google Cloud unveils Ironwood, its 7th Gen TPU to help boost AI performance and inference
Tech

Google Cloud unveils Ironwood, its 7th Gen TPU to help boost AI performance and inference

Google unveils Ironwood, its 7th-generation TPU Ironwood is designed for inference, the new big challenge for AI It offers huge advances in power...

SETI but for LLM; how an LLM solution that’s barely a few months old could revolutionize the way inference is done
Tech

SETI but for LLM; how an LLM solution that’s barely a few months old could revolutionize the way inference is done

Exo supports LLaMA, Mistral, LlaVA, Qwen, and DeepSeek Can run on Linux, macOS, Android, and iOS, but not Windows AI models needing 16GB...

Bye bye Nvidia? Chinese cloud providers aggressively cut down AI inference costs by using Huawei’s controversial accelerators and DeepSeek’s tech
Tech

Bye bye Nvidia? Chinese cloud providers aggressively cut down AI inference costs by using Huawei’s controversial accelerators and DeepSeek’s tech

DeepSeek’s V3 and R1 models are available through Huawei’s Ascend cloud service They are powered by the Ascend 910x accelerators banned in the...

Navigating the rising costs of AI inference in the era of large-scale applications
Tech

Navigating the rising costs of AI inference in the era of large-scale applications

The momentum of AI-driven applications is accelerating around the world and shows little sign of slowing. According to data from IBM, 42% of...

AI energy efficiency monitoring ranks low among enterprise users, survey by inference CPU specialists finds
Tech

AI energy efficiency monitoring ranks low among enterprise users, survey by inference CPU specialists finds

Swimlane survey finds many businesses aren’t keeping on top of AI energy needs Nearly three quarters are aware of the dramatic energy demands...

Apple embraces Nvidia GPUs to accelerate LLM inference via its open source ReDrafter tech
Tech

Apple embraces Nvidia GPUs to accelerate LLM inference via its open source ReDrafter tech

ReDrafter delivers 2.7x more tokens per second compared to traditional auto-regression ReDrafter could reduce latency for users while using fewer GPUs Apple hasn’t...

Microsoft backed a tiny hardware startup that just launched its first AI processor that does inference without GPU or expensive HBM memory and a key Nvidia partner is collaborating with it
Tech

Microsoft backed a tiny hardware startup that just launched its first AI processor that does inference without GPU or expensive HBM memory and a key Nvidia partner is collaborating with it

Microsoft-backed startup introduces GPU-free alternatives for generative AI DIMC architecture delivers an ultra-high memory bandwidth of 150 TB/s Corsair supports transformers, agentic AI,...

Nvidia’s closest rival once again obliterates cloud giants in AI performance; Cerebras Inference is 75x faster than AWS, 32x faster than Google on Llama 3.1 405B
Tech

Nvidia’s closest rival once again obliterates cloud giants in AI performance; Cerebras Inference is 75x faster than AWS, 32x faster than Google on Llama 3.1 405B

Cerebras hits 969 tokens/second on Llama 3.1 405B, 75x faster than AWS Claims industry-low 240ms latency, twice as fast as Google Vertex Cerebras...