DeepSeek's new chatbot boasts an impressive introduction: "Hi, I was created so you can ask anything and get an answer that might even surprise you." This AI, a product of the Chinese startup DeepSeek, has rapidly become a major market player, even contributing to a significant drop in NVIDIA's stock price.

DeepSeek's competitive edge lies in its innovative architecture and training methods. Key technologies include:
- Multi-token Prediction (MTP): Instead of predicting words individually, MTP forecasts multiple words simultaneously, boosting accuracy and efficiency.
- Mixture of Experts (MoE): This architecture uses multiple neural networks, accelerating training and improving performance. DeepSeek V3 utilizes 256 networks, activating eight for each token.
- Multi-head Latent Attention (MLA): MLA focuses on crucial sentence parts repeatedly, minimizing the risk of overlooking important information.
DeepSeek initially claimed to have trained its powerful DeepSeek V3 neural network for a mere $6 million using 2048 GPUs. However, SemiAnalysis revealed a far larger infrastructure: approximately 50,000 Nvidia Hopper GPUs, including 10,000 H800s, 10,000 H100s, and additional H20s, spread across multiple data centers. This represents a total server investment of roughly $1.6 billion, with operational expenses estimated at $944 million.

DeepSeek, a subsidiary of the Chinese hedge fund High-Flyer, owns its data centers, providing unparalleled control over optimization and innovation implementation. This self-funded approach fosters agility and rapid decision-making. The company also attracts top talent, with some researchers earning over $1.3 million annually, primarily from Chinese universities.

DeepSeek's initial $6 million training cost claim is misleading; it only reflects pre-training GPU usage, excluding research, refinement, data processing, and infrastructure. The company's total investment in AI development exceeds $500 million. Despite this substantial investment, its lean structure allows for efficient innovation implementation.

DeepSeek's success showcases the potential of well-funded independent AI companies to compete with industry giants. However, its achievements are attributable to billions in investment, technical breakthroughs, and a strong team, not a revolutionary budget. Even so, DeepSeek's costs remain significantly lower than competitors. For example, DeepSeek spent $5 million on R1, compared to ChatGPT's $100 million for ChatGPT4o. This highlights the significant cost advantage, despite the inflated initial claims.