šŸ’„šŸ¦¾ What DeepSeek-V3 Teaches Us About Efficient AI Infrastructure

A Newsletter for Entrepreneurs, Investors, and Computing Geeks

We are back with our weekly newsletter! And we’re joined by Emily, a VC with a passion for computing, who will be helping us curate this newsletter going forward.

This week’s deep dive looks at DeepSeek-V3 and the practical lessons it offers for building large-scale AI systems efficiently. We also highlight a breakthrough in quantum computing and share curated news across AI, quantum, photonics, neuromorphic, and infrastructure, plus key readings and funding. Plus a bonus section with different perspectives on the US vs. China tech race.

Finally, thanks to everyone who joined the Future of Computing Conference in Berlin last June – we’re in active preparations for the next edition in Paris on November 6 (more on that soon!)

Deep Dive: What DeepSeek-V3 Teaches Us About Efficient AI Infrastructure

Summary:
The team at DeepSeek-AI has managed to train a top-tier open-source large language model (DeepSeek-V3) using just 2,048 NVIDIA H800 GPUs. Instead of relying on brute-force scaling, they focused on a tight integration between model architecture and hardware. The result is a highly efficient system that challenges the idea that only Big Tech can play in the large-model arena.

Key Takeaways:

  • Smarter Attention = Less Memory Use
    DeepSeek-V3 uses a method called Multi-head Latent Attention (MLA) to compress memory usage during inference. This reduces the memory needed per token by up to 85 percent compared to models like LLaMA-3, which is crucial for handling long-context inputs efficiently.

  • Efficient Scaling with Sparse Models
    Thanks to a Mixture of Experts (MoE) architecture, the model activates only a small subset of its 671 billion parameters at any time. This reduces compute costs while maintaining performance, making the model more practical for on-premises or personalized use.

  • Trained with simplified numerical formats
    To make training more efficient, DeepSeek-V3 uses a compact type of number representation known as FP8 precision. These simplified formats occupy significantly less space than traditional ones, which helps reduce memory usage and speed up computation, all with minimal impact on model quality.

  • Faster Text Generation with Multi-Token Prediction
    Instead of generating one token at a time, the model predicts several in parallel and verifies them on the fly. This approach boosts generation speed by up to 1.8 times in real-world scenarios.

  • Networking Matters More Than Ever
    The team redesigned the GPU network using a multi-plane topology to reduce latency and keep infrastructure costs down. They also optimized token routing to avoid communication bottlenecks.

Why It Matters:
DeepSeek-V3 is a strong example of what becomes possible when models and infrastructure are designed together. As GPU availability tightens and energy costs rise, this kind of smart engineering may help smaller players stay competitive.

Another very relevant analysis of DeepSeek, looking at different aspects than those mentioned above: DeepSeek Debrief: >128 Days Later (SemiAnalysis)

Spotlight

ā€œA research team has achieved the holy grail of quantum computing: an exponential speedup that’s unconditional. By using clever error correction and IBM’s powerful 127-qubit processors, they tackled a variation of Simon’s problem, showing quantum machines are now breaking free from classical limitations, for real.ā€

Headlines

Last week’s headlines span major milestones in AI, quantum, photonics, and neuromorphic computing, plus growing concerns around the energy and water footprint of data centers.

 šŸ¤– AI

🦾 Semiconductors

āš›ļø Quantum Computing

 āš”ļø Photonic / Optical Computing

🧠 Neuromorphic Computing

šŸ’„ Data Centers

Selected Readings

This week’s reading list spans semiconductor strategies, photonic innovation, and the environmental impact of data centers.

🦾 Semiconductors

āš”ļø Photonic / Optical Computing

šŸ’„ Data Centers

Funding News

Last week’s funding activity highlights momentum across foundational compute technologies, from integrated photonics and quantum error correction to edge HPC and energy-aware AI infrastructure. These enabling technologies are critical to scaling next-generation workloads.

Meanwhile, xAI’s $10B raise underscores how capital-intensive the foundation model race has become and how high the stakes are.

šŸ¤– AI

āš›ļø Quantum Computing

āš”ļø Photonic / Optical Computing

šŸ’„ Data Centers

Bonus: US vs. China - From Different Perspectives

This section brings together different perspectives on the US vs. China tech race, including takes from US media, Asian outlets, and stock market analysts.

Why is the US leading in the chip industry? (36kr - a Chinese media company)

Love these insights? Forward this newsletter to a friend or two. They can subscribe here.