Analysis With the latest round of trade restrictions on AI chips, the Biden Administration is poised to all but cut off the Chinese market from high-end GPUs and accelerators – not just in the datacenter, but at home as well.
The rules, announced this week, seek to prevent US persons or companies from furthering the People’s Republic of China’s – and other countries of concern – military and surveillance agendas.
And as we’ve previously reported, the updated restrictions are likely to impact a large swath of Nvidia’s GPU lineup, including its H800 and A800 kit built to comply with last fall’s export rules. That’s bad news for the Chinese web giants that had reportedly planned to purchase $4 billion worth of the cards in 2024, and for US companies, like Intel and AMD, working on their own cut-down chips for sale in the Middle Kingdom, and bad news for the vendors hoping to sell more hardware.
Performance caps for chips bound for China
Until now, the primary performance cap on GPUs and AI accelerators exported to countries of concern — i.e. China — have centered around interconnect bandwidth. This refers to the speed at which the processors can communicate with each other. Last year’s rules restricted the export of chips with bidirectional interconnect bandwidth of 600GB/s, without a special license.
In response, Nvidia and Intel both tweaked their latest GPUs, nerfing the interconnect speeds to skirt under the Commerce Department’s restrictions. Those H800s we mentioned earlier are a prime example.
The Biden administration has now gone a step further by implementing a set of caps on performance density. Per the Bureau of Industry and Security (BIS) filing [PDF] this week, the first and arguably most important of these rules restricts the export of:
“Integrated circuits having one or more digital processing units have either of the following: a.1. a ‘total processing performance’ of 4,800 or more, or a.2. a ‘total processing performance’ of 1,600 or more and a ‘performance density’ of 5.92 or more.”
Calculating the total processing performance (TPP) score for any given GPU or accelerator is a fairly straightforward project. Double the max number of dense tera-operations — floating point or integer — a second and multiple by the bit length of the operation. If there are multiple performance metrics advertised for various precisions — INT4, FP8, FP16, and FP32, for example — the highest TPP score is used.
Using Nvidia’s L40S as an example, the equation would look a bit like this:
2 x 733 teraFLOPS x 8 bits = a TPP of 11,728
The eagle-eyed among you may have noticed that we’re not using the 1,466 teraFLOPS of FP8 advertised by Nvidia on its data sheet. This is because, for the purposes of calculating TPP, processors that offer both dense and sparse calculations should disregard the latter.
The TPP figure can then be used to determine the performance density of the chip. This figure is calculated by dividing TPP by the “applicable die area.” Going back to our L40S example, the GPU uses the AD102 die, which has a surface area of 609 mm², so our calculation would look something like this:
11,728 TPP / 609 mm² = a performance density of 19.25
This puts it well above the 5.92 performance density limit imposed by the new rules. Though, we’ll note it’s not clear whether memory is considered logic for the purposes of calculating performance density.
What about lower-end chips?
For less powerful chips, there’s a somewhat odd exception. Per the BIS filing:
“b. Integrated circuits having one or more digital processing units having either of the following: b.1 a ‘total processing performance’ of 2,400 or more and less than 4,800 and a ‘performance density of 1.6 or more and less than 5.92.”
This appears to be targeted at older GPUs and accelerators, like AMD’s Instinct MI100, which we estimate to have a TPP of 2,953 and a performance density of 3.93.
However, a card like Nvidia’s small-form factor L4 GPU could skirt by unchallenged, despite having a TPP of around 3,880. With a die area of 294 mm², its performance density would fall outside the range described in the rule.
This is likely why the card didn’t make Nvidia’s list of GPUs affected by the rules. That list included A100, A800, H100, H800, L40, L40S, and RTX 4090 — more on that last one in a minute. Nvidia declined to comment further on the export restrictions and pointed us back to its earlier SEC filing.
The rule also includes provisions for chips with lower performance densities that can be sold to China and others. It defines controls for chips with a TPP of 1,600 or more and a performance density of 3.2 or more and less than 5.92. If we had to guess, this rule is intended to prevent chipmakers from using multiple lower performance chiplets to get around the limitations.
Not just Nvidia
While Nvidia — which controls a huge share of the AI chip market — is likely to bear the brunt of this decision, Intel and AMD are almost certainly going to be impacted by the rules as well.
While AMD’s top spec’d — for now — MI250X was already subject to last year’s export restrictions, the MI210 technically slid under the 600GB/s bandwidth limit. However, by our estimates that card has a TPP score of 5,792 and a power density of 8, so, it’s unlikely AMD will be able to sell the card in China once the rules go into effect later this fall.
AMD has publicly stated they’re working on a special accelerator akin to Nvidia’s A800 and H800 for sale in China. AMD had not responded to our request for comment at the time of publication.
We suspect Intel is also in a similar boat with its China-spec Gaudi2 HL225B, given the company’s earlier claims that the accelerator out performed Nvidia’s A100, at least in certain select AI workloads. But, since Intel won’t tell us what the accelerators floating point performance is, it’s hard to say for sure. In a statement provided to The Register, the chip giant said it’s “reviewing the regulations and assessing the potential impact.”
Consumer GPUs mostly spared for now
It’s worth noting that the new rules only explicitly impact chips designed for datacenter applications, which means most consumer cards won’t be affected. This is despite the fact many GPUs use the same dies as their datacenter counterparts.
One exception outlined in the BIS filing is for cards that have a TPP of 4,800 or more.
This is why, in Nvidia’s SEC filing, the company said it probably wasn’t going to be able to sell its RTX 4090 cards in China anymore. By our estimate, that bit of kit has a TPP score in the neighborhood of 5,285. However, it’s also likely the only consumer graphics card subject to export controls to China — at least for now.
By our calculations, AMD’s most powerful consumer graphics card, the RX 7900 XTX, comes in with a TPP score of 3,904, below the threshold for consumer cards.
This is a potentially problematic loophole in the rules, as it’s believed that Chinese agencies have previously used repurposed Nvidia GPUs and Intel processors, obtained through shell companies, to power things like nuclear weapons sims. With that said, the new rules do include provisions to make indirect imports harder.
The number of consumer and datacenter GPUs that fall under the purview of these restrictions is likely to grow as vendors roll out new, more powerful cards. For example, AMD’s 7900 XTX delivered roughly 2.5x higher FP32 performance than its predecessor.
This suggests that the next high-end desktop GPU we see from AMD will almost certainly cross the line. That’s unless, of course, the US government makes regular adjustments to the goal post.
Let the stockpiling begin
According to the industry watchers at TrendForce, the regulations are likely to curb Chinese appetite for Nvidia’s high-end AI servers from 5-6 percent of global demand to 3-4 percent.
What’s more, the group anticipates large web and cloud providers, like ByteDance, Baidu, Alibaba, and Tencent will begin stockpiling GPUs before the new rules go into effect. “Nvidia is also likely to attempt to allocate its currently scare resources, such as the H800, for use by Chinese customers,” TrendForce said in a research note.
Long term, TrendForce expects Chinese firms to accelerate development of independent chips, and pointed to Alibaba’s Pingtouge jumping into the ASIC arena and Huawei’s investments in its Ascend compute platform as examples.
In the meantime, analysts suggest Chinese companies are likely to shift AI development to resources rented elsewhere.
While the export curbs may make it harder for Chinese interests to get their hands on AI chips from the US, they don’t do much to address online access via the cloud.
AI accelerators are widely deployed in public clouds, where they can be accessed remotely from anywhere in the world. This poses a problem that the Biden administration has yet to address in the latest round of chip curbs.
According to the BIS filing, the agency is seeking public comment and “input from [infrastructure-as-a-service] providers on the feasibility for them in complying with additional regulations in this area, how they would identify whether a customer is ‘developing’ or ‘producing’ a dual-use AI foundation model, and what actions would be needed to address this national security concern while minimizing business process changes that would be required to comply with these regulations.” ®