this post was submitted on 15 Feb 2024

151 points (93.6% liked)

Technology

59123 readers

4466 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

[email protected]

151

submitted 8 months ago by [email protected] to c/[email protected]

53 comments fedilink hide all child comments

“In 10 years, computers will be doing this a million times faster.” The head of Nvidia does not believe that there is a need to invest trillions of dollars in the production of chips for AI::Despite the fact that Nvidia is now almost the main beneficiary of the growing interest in AI, the head of the company, Jensen Huang, does not believe that

top 50 comments

sorted by: hot top controversial new old

[–] [email protected] 77 points 8 months ago (1 children)

Despite the fact that Nvidia is now almost the main beneficiary of the growing interest in AI, the head of the company, Jensen Huang, does not believe that additional trillions of dollars need to be invested in the industry.

*Because of

You heard it, guys. There's no need to create competition to Nvidia's chips. It's perfectly fine if all the profits go to Nvidia, says Nvidia's CEO.

[–] [email protected] 2 points 8 months ago

Don't ask a businessman about creating competition to his business

[–] [email protected] 32 points 8 months ago* (last edited 8 months ago) (1 children)

This isn't necessarily about just hardware. Current ML architectures and inference engines are far from being at peak efficiency. Just last year we saw 20x speedups for llm inference on some hardware. "a million times" is obviously hyperpole though.

[–] [email protected] 7 points 8 months ago

Literally reading preprint papers daily on more efficient implementations of self attention approximations.

[–] [email protected] 24 points 8 months ago

He doesn't want a new competitor. He's just spouting whatever will make the line move up. It has nothing to do with his opinion.

[–] [email protected] 19 points 8 months ago (2 children)

Honestly as someone who has watched the once-fanciful prefixes “giga” and “tera” enter common parlance, and saw kilobytes of RAM turn to gigabytes, it’s really hard for me to think what he’s saying is impossible.

[–] [email protected] 17 points 8 months ago

Even if he is accurate, specialist hardware will outperform generic hardware at what it is specialized for.

I remember a story sometime in the 00s about PCs finally getting to the point where they were as fast as one of the WWII code breaking computers (or something like that). It wasn't because we backtracked in computer speeds after WWII, but because even that ancient hardware was able to get good performance when it was purpose-built, but it couldn't do anything else and likely would have required a lot of work to adjust to a different kind of cypher scheme, if it could be adapted at all.

So GP compute might be a million times faster in a decade, but specialist AI chips might be a million times faster than that.

A hardware neural net might be able to eliminate memory latency by giving each neuron fast resisters to handle all their memory needs. If it doesn't need to change connections, each connection could be hard wired. A GPU wouldn't have a chance at keeping up no matter how wide that memory bus gets or how many channels it gets split into. It might even use way less power (though with the elimination of memory latency, it could go fast enough to use way more, too).

[–] [email protected] 14 points 8 months ago (1 children)

Nobody is saying it won't happen eventually. But a million times within the next decade, i.e. 4x better every year for 10 years?

This generation isn't better than last generation by even close to that. Nevermind doing 4x for 10 years straight.

[–] [email protected] 2 points 8 months ago

He was probably not being literal with the number, but when you're the head of a computer chip hardware company you should pick numbers carefully.

[–] [email protected] 16 points 8 months ago* (last edited 8 months ago) (2 children)

Sorry I have doubts, because that would require a factor 4x increase every year for 10 years! 4x^10 = 1,048,576x
Considering they historically have had problems achieving just twice the speed per year, it does not seem likely.

[–] [email protected] 1 points 8 months ago (1 children)

https://lemmy.world/comment/7569717

[–] [email protected] 1 points 8 months ago* (last edited 8 months ago) (1 children)

Yes, but usually we keep those 2 kinds of optimizations separate, only combining chip design and production process. Because if the software is optimized, the hardware isn't really doing the same thing.
So yes AI speed may increase more than just the hardware, but for the most sophisticated systems, the tasks will be more complex, which may again slow the software down.
So I think they will never be able to achieve it even when considering software optimizations too. Just the latest Tesla cars boast about 4 times higher resolution cameras, that will require 4 times the processing power to process image recognition, which then will be more accurate, but relatively slower.
We are not where we want to be, and the systems of the future will clearly be more complex, and on the software are more likely to be slower than faster.

[–] [email protected] 2 points 8 months ago (1 children)

even software that does the same thing gets slower example: Microsoft Office, Amazon, the web in general, etc.

[–] [email protected] 1 points 8 months ago

That is so true, increased complexity tend to slow things down.

[–] [email protected] 1 points 8 months ago (1 children)

Twice for AI or computing in general?

[–] [email protected] 4 points 8 months ago* (last edited 8 months ago) (2 children)

Why does that make a difference? Compute for AI is build on the progress for compute first for GPU then for data center. They are similar in nature.
Yes they have exceeded 2x for AI for a while, but that has been achieved through exploding die size and cost, but even that won't make a million times faster in 10 years possible, because they can't increase die sizes any further.

[–] [email protected] 3 points 8 months ago (2 children)

Building an ASIC for purpose built computation is significantly faster than generic gpu compute cores. Like when ASICs were built for bitcoin mining/sha256 and a little 5 watt usb device could outperform the best GPUs

[–] [email protected] 1 points 8 months ago

It may be even more specialized than that. It might be a return to analog computers.

Which isn't going to work for Nvidia's traditional products, either.

[–] [email protected] 1 points 8 months ago* (last edited 8 months ago)

The H200 is evolved from Nvidia GPU designs, and will be by far the most powerful AI component in existence when it arrives later this year, AI is now so complex, that it doesn't really make sense to call it an ASIC or to use an ASIC for the purpose, and the cost is $40,000.- for a single H200 unit!!! So no not small 5 watt units, more like 100x that.
If they could make small ASICS that did the same, they'd all do it. Nvidia AMD Intel Google Amazon Huawei etc. But it's simply not an option.

Edit:

In principle the H200 AI/Compute system, is a giant cluster of tiny ASICS built onto one chip for massive parallel compute and greater speed.

[–] [email protected] 1 points 8 months ago

There's also software improvements to consider, there's a lot of room for efficiency improvements.

[–] [email protected] 11 points 8 months ago

Because the ceos copy has to sound good for the shareholders on either side.

[–] [email protected] 10 points 8 months ago (23 children)

So a Cerebras wafer will be 10^6 faster for the same computation as now, for the same price, in just 10 years? Not after Moore scaling ended many years ago and neural hardware architecture has matured. You can sure go analog, but that's not the same computation. And that's the end of the line, not without true 3d integration.

[–] [email protected] 8 points 8 months ago* (last edited 8 months ago) (1 children)

It requires 4X speed increase every year, production quality scale can't provide even close to half of that, maybe 25%, then another 25% from design, and regarding increasing die sizes they are already close to the end. So the only way to get from 150% to 400% per year is by using multi chip designs, meaning they will have to use 2.5x more chips per year. so the multi chip package in 10 years will probably have to have almost 10,000 chips! All of them bleeding edge!!!

The H200 is estimated to cost $40K, the future 10 year chip will be more like $40 million. Or maybe more like impossible to achieve.

[–] [email protected] 2 points 8 months ago (1 children)

If chips = cpus, here, then I imagine that will hit a limit also (Amdahl's law).

[–] [email protected] 4 points 8 months ago (1 children)

A chip is also called a die, it's the piece cut out from the wafer, which is then packaged onto a chip package.
Since traditionally there were always 1 chip per chip package, the 2 words were used almost synonymously.
I this case it's basically GPU chips, which AFAIK AMD has already figured out how to use in multi chip packages. Meaning one package contains multiple chips that work "almost" as well as a single chip of similar size.

The advantage of multichip packages are obvious, production costs are way lower because smaller dies causes lower percentage of flawed dies, and allows for better binning of higher end parts.
Additionally it allows designs of way more complex packages, than would be possible with monolithic chips. This is the reason AMD has been taking marketshare in server markets from Intel. Because Intel has not been able to match the multichip design AMD introduced with Epyc in 2016/17, which originally was 4 Ryzen chiplets/chips/dies packaged together as one big 32 core server chip. Where the biggest Intel could make was 28 cores.

But packaging almost 10000 GPU chips together is completely different, and I don't think that will be relevant within 10 years.

Amdahls law however is part obvious and part bullshit. Everything your mind is able to do semi efficiently, can be multithreaded, it is very few things that can't.
Amdahls law is basically irrelevant with regard to AI, as AI has a lot of patten recognition, and pattern recognition is perfect for multi threading.

[–] [email protected] 3 points 8 months ago

And to add: currently TSMC nodes have a reticle limit of 858mm². I.e. that's the largest chips you can make on their wafers. Then in the real world you do it slightly below that.

Future nodes are reducing this to the 350-450mm² range.

High end GPUs/HPC cards basically have to go to multi-die, even in the fantasy world of 100% perfect yields.

[–] [email protected] 6 points 8 months ago

Then stop making new chips each year with a 5-7% performance improvement with a 20% increase in prices.

[–] [email protected] 2 points 8 months ago (1 children)

So, for a bit more tech illiterate, their claim is bs?

[–] [email protected] 2 points 8 months ago

I mean 1mio x is a big claim anyway.

[–] [email protected] 1 points 8 months ago

Yeah really, semiconductor has begun stagnating in progress recently due to fundamental limits. I'm gonna call bull on this one, I think they are rather forecasting pluging demand.

load more comments (19 replies)

[–] [email protected] 7 points 8 months ago

Can we just pray for the poor engineers who actually have to build these million times faster machines

[–] [email protected] 3 points 8 months ago

"But if you invest this money, I can eat prime rib all week."

[–] [email protected] 1 points 8 months ago

Bet

load more comments