• Home
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Subscribe to Daily Newsletter
  • NextGenInfra.io
No Result
View All Result
Converge Digest
Friday, April 10, 2026
  • Home
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Subscribe to Daily Newsletter
  • NextGenInfra.io
No Result
View All Result
Converge Digest
No Result
View All Result

Home » Cerebras Launches AI Inference Solution 20x Faster Than GPUs”

Cerebras Launches AI Inference Solution 20x Faster Than GPUs”

August 27, 2024
in All
A A

Cerebras Systems introduced a new AI inference solution, claiming it to be the fastest in the world. The Cerebras Inference platform delivers 1,800 tokens per second for the Llama 3.1 8B model and 450 tokens per second for the Llama 3.1 70B model, outperforming NVIDIA GPU-based solutions by 20 times in hyperscale cloud environments. The solution is priced competitively at just 10 cents per million tokens, offering a significant cost advantage over existing GPU options.

Powered by the Cerebras CS-3 system and the Wafer Scale Engine 3 (WSE-3) processor, the platform promises to maintain state-of-the-art accuracy without sacrificing speed, thanks to its 16-bit domain inference. The WSE-3 provides 7,000 times more memory bandwidth than the NVIDIA H100, addressing one of the core challenges of generative AI. Cerebras Inference is available across three pricing tiers—Free, Developer, and Enterprise—catering to different user needs, from basic access to custom enterprise solutions.

• Performance: 20x faster than GPU-based solutions, delivering 1,800 tokens per second on Llama 3.1 8B and 450 tokens per second on Llama 3.1 70B.

• Pricing: Starting at 10 cents per million tokens, significantly lower than GPU alternatives.

• Technology: Powered by the WSE-3 processor with 7,000x more memory bandwidth than NVIDIA H100.

• Availability: Offered in Free, Developer, and Enterprise tiers with varying levels of access and support.

“Speed and scale change everything,” said Kim Branson, SVP of AI/ML at GlaxoSmithKline, an early Cerebras customer.

“LiveKit is excited to partner with Cerebras to help developers build the next generation of multimodal AI applications. Combining Cerebras’ best-in-class compute and SoTA models with LiveKit’s global edge network, developers can now create voice and video-based AI experiences with ultra-low latency and more human-like characteristics,” said Russell D’sa, CEO and Co-Founder of LiveKit.

“For traditional search engines, we know that lower latencies drive higher user engagement and that instant results have changed the way people interact with search and with the internet. At Perplexity, we believe ultra-fast inference speeds like what Cerebras is demonstrating can have a similar unlock for user interaction with the future of search – intelligent answer engines,” said Denis Yarats, CTO and co-founder, Perplexity.

Cerebras Wafer Scale Engine packs 1.2 trillion transistors
Tags: CerebrasHot Chips
ShareTweetShare
Previous Post

Broadcom Enhances VeloCloud Software-Defined Edge with FWA, Satellite Acces

Next Post

Intel Shows its Optical Compute Interconnect (OCI) Chiplet at Hot Chips

Jim Carroll

Jim Carroll

Editor and Publisher, Converge! Network Digest, Optical Networks Daily - Covering the full stack of network convergence from Silicon Valley

Related Posts

Hot Chips 2025: Celestial AI CTO Details In-Die Optical I/O
All

Hot Chips 2025: Celestial AI CTO Details In-Die Optical I/O

August 29, 2025
Hot Chips 2025: Rapidus CEO Unveils 2nm Progress 
All

Hot Chips 2025: Rapidus CEO Unveils 2nm Progress 

August 29, 2025
Cerebras + Ranovus = Wafer-Sale Compute + Co-Packaged Optics
All

Cerebras + Ranovus = Wafer-Sale Compute + Co-Packaged Optics

April 1, 2025
Cerebras packs 4 trillion transistor into CS-3 AI processor
Data Centers

Cerebras Expands with Six New AI Data Centers

March 13, 2025
Cerebras Wafer Scale Engine packs 1.2 trillion transistors
Financials

Cerebras Files for IPO, Aiming to Revolutionize AI with its Wafer-Scale Chip

October 5, 2024
Cerebras packs 4 trillion transistor into CS-3 AI processor
Semiconductors

Cerebras Signs Aramco to Accelerate AI in Saudi Arabia

September 11, 2024
Next Post
Intel Shows its Optical Compute Interconnect (OCI) Chiplet at Hot Chips

Intel Shows its Optical Compute Interconnect (OCI) Chiplet at Hot Chips

Categories

  • 5G / 6G / Wi-Fi
  • AI Infrastructure
  • All
  • Automotive Networking
  • Blueprints
  • Clouds and Carriers
  • Data Centers
  • Enterprise
  • Explainer
  • Feature
  • Financials
  • Last Mile / Middle Mile
  • Legal / Regulatory
  • Optical
  • Quantum
  • Research
  • Security
  • Semiconductors
  • Space
  • Start-ups
  • Subsea
  • Sustainability
  • Video
  • Webinars

Archives

Tags

5G All AT&T Australia AWS Blueprint columns BroadbandWireless Broadcom China Ciena Cisco Data Centers Dell'Oro Ericsson FCC Financial Financials Huawei Infinera Intel Japan Juniper Last Mile Last Mille LTE Mergers and Acquisitions Mobile NFV Nokia Optical Packet Systems PacketVoice People Regulatory Satellite SDN Service Providers Silicon Silicon Valley StandardsWatch Storage TTP UK Verizon Wi-Fi
Converge Digest

A private dossier for networking and telecoms

Follow Us

  • Home
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Subscribe to Daily Newsletter
  • NextGenInfra.io

© 2025 Converge Digest - A private dossier for networking and telecoms.

No Result
View All Result
  • Home
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Subscribe to Daily Newsletter
  • NextGenInfra.io

© 2025 Converge Digest - A private dossier for networking and telecoms.

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.
Go to mobile version