DeepSeek AI, a Chinese artificial intelligence research lab, is making waves in the open-source community. Their latest gem? DeepSeek-V3, a large language model based on Mixture-of-Experts (MoE), has as many as 671 billion total parameters and 37 billion activated per token. The results speak for themselves: according to leading benchmarks, DeepSeek-V3 is the most powerful open-source model around, outperforming even popular closed-source models such as OpenAI’s GPT-4o and Anthropic’s Claude 3.5. The emergence of DeepSeek-R1 raises important ethical and moral questions regarding the influence of culture and politics in the development of artificial intelligence. Unlike models developed in the West, DeepSeek-R1 reflects the ‘core values of socialism’ demanded by the Chinese authorities and refuses to answer questions on topics considered sensitive by the Chinese government, such as Tiananmen Square, Taiwan’s autonomy or the treatment of the Uighurs.
This censorship is implemented directly in the model and does not depend on the platform or device on which it is used: instead of providing neutral answers, the model tends to avoid such topics or offer answers that closely align with the official government narrative. This dynamic raises questions about the objectivity of artificial intelligence and the potential risk of information manipulation. The alignment of AI with a nation’s moral values can turn these models into tools of ideological control and propaganda, undermining open debate and free access to information. This problem not only concerns DeepSeek-R1 but also extends to all artificial intelligence models, which inevitably reflect the moral and cultural values of those who train them.
New open-source AI model outperforms GPT-4o
DeepSeek-V3 recorded top results in no fewer than nine benchmarks, more than any other comparable model in terms of size. But the surprising thing is that, despite this excellent performance, DeepSeek-V3 requires only 2.788 million H800 GPU hours for full training, costing around $5.6 million. For comparison, the equivalent open-source Llama 3 405B model requires 30.8 million GPU hours. This is thanks to FP8 training support and deep engineering optimizations. But the surprises do not end there. DeepSeek-V3 is also extremely efficient in inference. As of 8 February, DeepSeek-V3 input will cost $0.27 per million tokens ($0.07 with cache), while output will cost $1.10 per million tokens.

Practically a tenth of what OpenAI and other leading companies currently charge for their flagship models. “This is just the beginning”, the DeepSeek team commented on the launch of DeepSeek-V3 on X this way: “Our mission is unwavering. We are excited to share our progress with the community and to see the gap between open and closed models narrowing. This is just the beginning! Expect multimodal support and other cutting-edge features in the DeepSeek ecosystem.” The DeepSeek-V3 model is already available on GitHub and HuggingFace. With its impressive performance and affordability, it could really democratize access to advanced AI models. It marks a significant step towards closing the gap between open and closed models.
Why V3?
DeepSeek reportedly trained its basic model, called V3, with a budget of $5.58 million over two months, according to Nvidia engineer Jim Fan. Although the company did not disclose the exact training data used, modern techniques make web-based training and open datasets increasingly accessible. Estimating the total cost of training, DeepSeek-R1 is challenging. Although running 50,000 GPUs suggests significant expenditure (potentially hundreds of millions of dollars), precise figures remain speculative.
What is clear, however, is that DeepSeek has been very innovative from the start. Last year, reports emerged of some early innovations it was making, on technologies such as Mixture of Experts and Multi-Head Latent Attention.