The push to build larger AI models, and the number of chips and data centers needed to develop them, is costing tech companies significant money.
Note: The original text was published on April 30, 2024, so the article does not mention DeepSeek and newer language models.
More than a year and a half after the hype around generative AI began, the industry’s biggest players are demonstrating that artificial intelligence can indeed generate impressive profits. However, these same investments are becoming a gigantic expense item.
Disclaimer: This is a free translation of a column written by Seth Figerman and Matt Day for Bloomberg. The translation was prepared by the editorial team of Technocracy. In order not to miss the announcement of new materials, subscribe to the Voice of Technocracy - we regularly talk about news about AI, LLM and RAG, and also share useful must-reads and current events.
Discuss the pilot or ask a question about LLM here .
Microsoft Corp. and Alphabet Inc.’s Google reported sharp increases in cloud revenue in the latest quarter as enterprise customers increasingly embrace their AI solutions. Meta Platforms Inc., while not yet generating significant revenue from artificial intelligence, says its AI deployments have already increased user engagement and improved the effectiveness of targeted advertising.
To achieve these initial results, all three companies have invested huge amounts of money in AI and intend to further increase these investments.
On April 25 last year, Microsoft reported $14 billion in capital expenditures for the quarter and warned of further significant growth, including on AI infrastructure. That’s up 79% from a year earlier. Alphabet spent $12 billion, up 91% year-over-year, and expects similarly expensive quarters ahead, with a focus on AI capabilities. Meta, meanwhile, raised its full-year capital expenditure guidance to $35 billion to $40 billion, up 42% from the high end of its previous range, citing aggressive investments in AI research and development.
The surge in AI costs has taken some investors by surprise. Meta shares fell sharply on the back of the company’s expense forecast and weaker-than-expected sales growth. But the tech community has long accepted that AI costs will inevitably rise. There are two reasons: AI models are becoming larger, and the global demand for such services requires the construction of additional data centers to support them.
A business experimenting with AI can spend millions to adapt solutions from OpenAI or Google. Once deployed, there is an ongoing cost for each call to the chatbot or AI analytics service, and even more expensive is the basic training of these systems. Below, we consider what contributes to this.
Large language models are getting bigger
Today’s most prominent AI products, including OpenAI’s ChatGPT, rely on big language models—systems that learn from vast amounts of text data (books, articles, comments from the internet) to ultimately generate the most accurate responses to user queries. Many leading companies believe that the path to more advanced AI—perhaps one that outperforms humans in many tasks—is to further scale up these language models.
This requires even more data, computing power, and training time. In an April podcast, Dario Amodei, the head of Anthropic (a competitor to OpenAI) , said that training current models on the market costs about $100 million.
“The models that are currently in training and will be available for sale later this year or early next year will cost closer to $1 billion,” he added. “And by 2025-26, the numbers will rise to $5 billion or even $10 billion.”
Chips and Computing Power Much of that spending is tied to chips. These aren’t the central processing units (CPUs) that made Intel Corp. famous, nor are they the stripped-down equivalents found in smartphones. To train large language models, companies use graphics processing units (GPUs), which can process massive amounts of data at high speed. Demand for these chips is high, but manufacturers are few and far between, with the most advanced chips coming primarily from Nvidia Corp.
Nvidia’s flagship H100 chip, which has become the benchmark for AI training, costs about $30,000, and resellers can charge several times more. Large tech companies need tens of thousands of such chips. Meta CEO Mark Zuckerberg previously said that his company would buy 350,000 H100 units by the end of the year to support AI research. Even taking into account discounts for bulk purchases, we are talking about costs in the billions of dollars.
It is possible to rent such chips, but this is also expensive. For comparison, in the Amazon.com Inc. cloud, a large cluster of Intel processors will cost about $6 per hour, while a set of Nvidia H100 chips will cost almost $100.
A month ago, Nvidia unveiled a new Blackwell chip architecture that is several times faster at training large language models and will likely cost the same as its Hopper line, which includes the H100. According to Nvidia, training a model with 1.8 trillion parameters (the approximate size of GPT-4, according to the New York Times lawsuit against OpenAI) would require about 2,000 Blackwell chips, compared to 8,000 for Hopper. But given the general trend toward ever-larger models, that advantage may be eroding.
Data centers
The companies that buy these chips need somewhere to house them. Meta, the major cloud providers (Amazon, Microsoft, and Google), and other computing power providers are rushing to build new data centers. These are often specialized complexes with rows of servers, storage, cooling, and power systems.
Research firm Dell’Oro Group estimates that $294 billion will be spent worldwide this year to build and equip such centers, up from $193 billion in 2020. Some of that growth is driven by the expansion of digital services, from streaming video and the explosion of corporate data to social media. But an increasing share is being spent on expensive Nvidia chips and other specialized devices needed to advance AI.
The number of data centers worldwide has already exceeded 7,000 (including projects at various stages), compared with 3,600 in 2015, according to analytics company DC Byte. And their sizes are growing. The average area of one complex is now 38,270 square meters, which is almost five times more than in 2010.
Deals and Personnel
While the lion’s share of spending goes on chips and data centers, some companies also spend millions licensing data from publishers.
OpenAI has struck deals with several European media houses to use their material in ChatGPT and to train its own models. Financial details are not disclosed, but Bloomberg News reports that the deal with German publisher Axel Springer SE (Politico and Business Insider) was worth tens of millions of euros. OpenAI has also been in talks with Time, CNN, and Fox News.
While OpenAI has been the most active in pursuing such partnerships, major tech companies are also looking for ways to gain access to the language data needed for AI. Google has struck a $60 million deal with Reddit to buy content, according to Reuters, and Meta has been in talks to acquire Simon & Schuster, according to the New York Times .
At the same time, tech giants are in a fierce battle for AI talent. Netflix Inc., for example, last year offered jobs for AI product managers with salaries of up to $900,000.
Cheaper alternatives
Microsoft, which has been among the biggest movers in the big language model market, recently offered a different strategy, announcing three smaller AI models that are more computationally efficient.
While large language models “will continue to be the gold standard for many complex problems,” such as “deep data analysis and context understanding,” smaller models may be better suited to specific customers and scenarios, Microsoft says. Other companies are working on shrinking models, including Sakana AI , a startup founded by two former Google employees.
“You don’t always need a race car,” says Rowan Curran, senior AI analyst at Forrester Research. “Sometimes a minivan or a pickup truck will do. There’s no one model that’s right for every purpose.”
However, the conventional wisdom in the AI industry is that bigger is better – and that comes with ever-increasing costs.
*The text mentions the company Meta, which is recognized as an extremist organization in the Russian Federation