🦾 From GPUs to LPUs – Where Groq Fits Among Nvidia, AMD, and Cerebras

A Newsletter for Entrepreneurs, Investors, and Computing Geeks

Happy Monday! Here’s what’s inside this week’s newsletter:

  • Deep dive: In light of Groq’s recent $750M funding round, we explore its Language Processing Unit (LPU), purpose-built for inference, and how the company compares to Nvidia, AMD, and Cerebras.

  • Spotlights: The European Semiconductor Industry Association’s (ESIA) position paper on the proposed EU Chips Act 2 and Microsoft’s blueprint for what it calls the ā€œworld’s most powerful data centerā€.

  • Headlines: Nvidia’s $5B Intel stake, Quantinuum’s ā€˜unconditional’ quantum supremacy claim, new photonic and neuromorphic advances, data center and cloud infrastructure moves, and soaring AI valuations.

  • Readings: Advanced foundry revenues, memory scaling, quantum investment strategies, 3D-printed optics, neuromorphic markets, gigawatt-scale data centers, and edge AI trends.

  • Funding news: A slower week overall, with activity spanning early-stage financings in quantum, photonics, and data centers, alongside larger raises in AI and networking, capped by Groq’s $750M round.

  • Bonus: The latest front in the U.S.–China chip wars, as Washington and Beijing escalate export restrictions and probes, while Huawei accelerates its own AI chip and infrastructure push.

Deep Dive: From GPUs to LPUs – Where Groq Fits Among Nvidia, AMD, and Cerebras

Last week, Groq raised $750M at a $6.9B valuation, more than doubling its valuation from just a year ago. Combined with a $1.5B Saudi Arabia infrastructure deal, Groq is positioned as a serious contender in the AI hardware race.

But what exactly is Groq’s bet? The answer lies in the Language Processing Unit (LPU), Groq’s custom-built chip designed specifically for AI inference.

What is a Language Processing Unit (LPU)?

The LPU is Groq’s clean-sheet alternative to GPUs, built solely for AI inference. While GPUs evolved from graphics and excel at parallel training, inference (especially real-time, batch-of-one workloads) requires a different architecture.

The LPU is built around four first-principle design choices:

  • Software-first: The compiler was built before the chip. It schedules every operation deterministically across LPUs, removing the need for custom kernels and ensuring full utilization.

  • Programmable assembly line: A single pipeline (ā€œconveyor beltā€) streams data through compute units in lockstep. Multiple LPUs link seamlessly, scaling linearly without external switches or routers.

  • Deterministic execution: No caches, no branch predictors, and no variability. Each instruction takes a fixed number of cycles, guaranteeing predictable latency for real-time systems.

  • On-chip memory: 80 TB/s SRAM bandwidth keeps data local, ~10Ɨ faster than GPUs using off-chip HBM (~8 TB/s), cutting latency and boosting efficiency.

Result: Up to 10Ɨ lower latency and 10Ɨ higher memory bandwidth than GPUs for inference.

Can Groq Overcome Nvidia’s Moat?

Nvidia is not standing still, and CUDA remains a massive moat. Beyond software inertia, enterprises calculate total cost of ownership (TCO) around Nvidia’s ecosystem, which makes switching even harder unless gains are overwhelming. But Groq has a strong tailwind: inference demand is growing faster than Nvidia can supply, and governments and enterprises seek cost savings, sovereignty, and diversification. Still, Nvidia is likely to keep leading the AI hardware market, though Groq’s focus on inference is carving out real momentum.

How Does Groq Compare to Nvidia, AMD, and Cerebras?

Groq

  • Valuation: $6.9B (private, 2025)

  • Performance: LPU chip (~725 TOPS); ~10Ɨ lower inference latency vs GPUs

  • Efficiency: On-chip SRAM (~10Ɨ GPU bandwidth); linear scale-out without bottlenecks

  • Target Market: Real-time inference (LLMs, NLP, vision); GroqCloud API & GroqRack clusters

→ Groq is purpose-built for inference, with deterministic design and on-chip memory that deliver latency and efficiency GPUs can’t match. But to succeed, it must also grow adoption of its own compiler and software stack against Nvidia’s CUDA ecosystem, which most developers rely on and rarely switch away from.

Nvidia

  • Valuation: ~$4T (public, 2025)

  • Performance: H100 GPU - one example of many - (~1,000 TFLOPs; 80–96 GB HBM); gold standard for training, strong inference throughput at high batch sizes

  • Efficiency: Off-chip HBM; ~700 W per GPU; optimized by CUDA ecosystem

  • Target Market: End-to-end AI across cloud, data centers, and edge

→ Nvidia remains the $4T giant with ~80% market share, dominating both training and inference. GPUs excel at high-throughput inference, though batch-of-one, real-time workloads remain less efficient — the exact gap Groq is trying to exploit. CUDA lock-in, however, makes it hard for customers to switch.

AMD

  • Valuation: ~$150B (public, 2025)

  • Performance: MI300X GPU (192 GB HBM3; 5.3 TB/s bandwidth); outperforms Nvidia H100 on LLM inference throughput

  • Efficiency: Fewer GPUs per model thanks to larger memory; chiplet design improves perf/W

  • Target Market: Large-model inference and HPC; enterprise AI

→ AMD’s MI300X closes the gap, with 192 GB of HBM3 and higher LLM inference throughput than Nvidia’s H100, allowing larger models to fit on fewer chips. This memory advantage reduces complexity and boosts efficiency, but adoption is still slowed by CUDA inertia despite AMD’s stronger raw performance.

Cerebras

  • Valuation: ~$3B (private, 2024)

  • Performance: WSE-3 wafer-scale chip (~125 PFLOPs BF16; 40+ GB on-chip memory); 850k cores on a single wafer

  • Efficiency: On-wafer memory; no networking overhead; ~20 kW power per wafer

  • Target Market: Ultra-large model training/inference; sovereign AI and national labs

→ Cerebras takes the opposite approach to Groq’s modular pipeline, concentrating compute on one massive wafer for extreme-scale models. Its WSE-3 delivers 125 PFLOPs and 40+ GB of on-chip memory while avoiding multi-chip networking. The tradeoff is cost and power, with each wafer drawing ~20 kW.

Spotlights

The European Semiconductor Industry Association (ESIA) has published a position paper on the proposed EU Chips Act 2. Building on the 2023 Chips Act that aimed to double Europe’s market share to 20% by 2030, ESIA calls for a stronger focus on industrial deployment, AI leadership, and innovation infrastructure to close Europe’s competitiveness gap.

Key recommendations include:

  • Establishing a dedicated semiconductor budget with faster and more flexible funding.

  • Expanding ā€œfirst-of-a-kindā€ (FOAK) support to cover upstream suppliers, equipment makers, and joint ventures.

  • Prioritizing chips for AI, both foundational and leading-edge.

  • Creating an institutionalized high-level dialogue between policymakers and industry.

  • Simplifying administrative and regulatory rules to boost innovation.

ā€œMicrosoft unveils the blueprint for its next generation of AI data centers. The first location will be built in Mount Pleasant, Wisconsin, where Microsoft plans to put hundreds of thousands of Nvidia Blackwell accelerators into operation in early 2026. The cost is expected to be $3.3 billion. According to Microsoft, the Fairwater data center in Wisconsin will provide 10 times more computing power than the world's best-equipped data center today when it is completed. It is not clear which hyperscaler will operate it – private companies do not register their systems in the Top500 list of the world's fastest supercomputers. Since Microsoft, Meta, Amazon, and Google, among others, already operate data centers with more than 100,000 accelerators, Microsoft's new building could approach the million GPU mark.ā€

Headlines


Last week’s headlines featured new semiconductor collaborations, a dense wave of quantum breakthroughs, advances in photonic and neuromorphic tech, developments in data centers and cloud, and soaring AI valuations.

āš›ļø Quantum

āš”ļø Photonic / Optical

🧠 Neuromorphic

šŸ’„ Data Centers

If you’d like to learn more about orbital data centers, check out our interview with Starcloud!

ā˜ļø Cloud

šŸ¤– AI

Readings


This week’s reading list includes updates on advanced foundry revenues and memory scaling, quantum investing and machine learning, 3D-printed optics, neuromorphic theory and markets, the rise of gigawatt-scale data centers, and edge AI and connectivity trends.

🦾 Semiconductors

Scaling Memory With Molybdenum (SemiEngineering) (7 mins)

āš›ļø Quantum

āš”ļø Photonic / Optical

Semiconductors & Photonics (2025) (Sifted) (20 mins ‒ Paywall)

🧠 Neuromorphic

šŸ’„ Data Centers

šŸ“” Connectivity

šŸ¤– AI

Funding News


Last week saw fewer rounds than the week before, a little over half in number. Activity ranged from early-stage financings in quantum, data centers, and photonics to larger raises in AI and networking, capped by Groq’s $750M round.

Amount

Name

Round

Category

Undisclosed

Microamp

Strategic Funding

Connectivity

$12.7M

Atomionics

ā€œPre-Series Aā€

Quantum

$15.5M

Mueon

Seed

Data Centers

$35M

Omni Design Technologies

Series A

Networking

€57M

Cailabs

Venture Round + Debt

Photonics

$72M

Luminary

Series B

AI

$100M

Upscale AI

Seed

Networking

$750M

Groq

Venture Round

Semiconductors

Bonus: The Latest Battle in the U.S. vs China Chip Wars

And again: The U.S. and China are tightening the screws on each other’s semiconductor industries. Washington added two Chinese chipmakers to the so-called Entity List, while Beijing barred companies from buying Nvidia’s AI chips, accused it of antitrust violations, and launched new probes into U.S. semiconductors. Meanwhile, Huawei is moving fast to fill the gap with its own AI infrastructure and chip plans.

ā¤ļø Love these insights? Forward this newsletter to a friend or two. They can subscribe here.