Английская Википедия:Ada Lovelace (microarchitecture)

Ada Lovelace, also referred to simply as Lovelace,^[1] is a graphics processing unit (GPU) microarchitecture developed by Nvidia as the successor to the Ampere architecture, officially announced on September 20, 2022. It is named after the English mathematician Ada Lovelace,^[2] one of the first computer programmers. Nvidia announced the architecture along with the new GeForce 40 series consumer GPUs^[3] and the RTX 6000 Ada Generation pro workstation graphics card.^[4] The Lovelace chipset uses TSMC's new 5 nm "4N" process which offers increased efficiency over the previous Samsung 8 nm and TSMC N7 processes used by Nvidia for its previous-generation Ampere architecture.^[5]

Background

The Ada Lovelace architecture follows on from the Ampere architecture that was released in 2020. The Ada Lovelace architecture was announced by Nvidia CEO Jensen Huang during a GTC 2022 keynote on September 20, 2022 with the architecture powering Nvidia's GPUs for gaming, workstations and datacenters.^[6]

Architectural details

Architectural improvements of the Ada Lovelace architecture include the following:^[7]

CUDA Compute Capability 8.9^[8]
TSMC 4N Шаблон:Nbspprocess (custom designed for NVIDIA) - not to be confused with TSMC's regular N4 node
4th-generation Tensor Cores with FP8, FP16, bfloat16, TensorFloat-32 (TF32) and sparsity acceleration
3rd-generation Ray Tracing Cores, plus concurrent ray tracing and shading and compute
Shader Execution Reordering (SER)^[9]
Nvidia video encoder/decoder (NVENC/NVDEC) with 8K 10-bit 60FPS AV1 fixed function hardware encoding^[10]^[11]
No NVLink support^[12]^[13]

Streaming multiprocessors (SMs)

CUDA cores

128 CUDA cores are included in each SM.

RT cores

Ada Lovelace features third-generation RT cores The RTX 4090 features 128 RT cores compared to the 84 in the previous generation RTX 3090 Ti. These 128 RT cores can provide up to 191 TFLOPS of compute with 1.49 TFLOPS per RT core.^[14] A new stage in the ray tracing pipeline called Shader Execution Reordering (SER) is added in the Lovelace architecture which Nvidia claims provides a 2x performance improvement in ray tracing workloads.^[6]

Tensor cores

Lovelace's new fourth-generation Tensor cores enable the AI technology used in DLSS 3's frame generation techniques. Much like Ampere, each SM contains 4 Tensor cores but Lovelace contains a greater number of Tensor cores overall given its increased number of SMs.

Clock speeds

There is a significant increase in clock speeds with the Ada Lovelace architecture with the RTX 4090's base clock speed being higher than the boost clock speed of the RTX 3090 Ti.

	RTX 2080 Ti	RTX 3090 Ti	RTX 4090
Шаблон:Midsize	Turing	Ampere	Ada Lovelace
Шаблон:Midsize	1350	1560	2235
Шаблон:Midsize	1635	1860	2520

Cache and memory subsystem

	RTX 2080 Ti	RTX 3090 Ti	RTX 4090
Шаблон:Midsize	Turing	Ampere	Ada Lovelace
Шаблон:Midsize	6.375Шаблон:NbspMB (96Шаблон:NbspKB per SM)	10.5Шаблон:NbspMB (128Шаблон:NbspKB per SM)	16Шаблон:NbspMB (128Шаблон:NbspKB per SM)
Шаблон:Midsize	5.5Шаблон:NbspMB	6Шаблон:NbspMB	72Шаблон:NbspMB

The fully enabled AD102 Lovelace die features 96Шаблон:NbspMB of L2 cache, a 16x increase from the 6Шаблон:NbspMB in the Ampere-based GA102 die.^[15] The GPU having quick access to a high amount of L2 cache benefits complex operations like ray tracing compared to the GPU seeking data from the GDDR video memory which is slower. Relying less on accessing memory for storing important and frequently accessed data means that a narrower memory bus width can be used in tandem with a large L2 cache.

Each memory controller uses a 32-bit connection with up to 12 present for a combined memory bus width of 384-bit. The Lovelace architecture can use either GDDR6 or GDDR6X memory. GDDR6X memory features on the desktop GeForce RTX 40 series while the more energy-efficient GDDR6 memory is used on its corresponding mobile versions and on RTX A6000 workstation GPUs.

Power efficiency and process node

The Ada Lovelace architecture is able to use lower voltages compared to its predecessor.^[6] Nvidia claims a 2x performance increase for the RTX 4090 at the same 450W used by the previous generation flagship RTX 3090 Ti.^[16]

Increased power efficiency can be attributed in part to the smaller fabrication node used by the Lovelace architecture. The Ada Lovelace architecture is fabricated on TSMC's cutting-edge 4N process, a custom designed process node for Nvidia. The previous generation Ampere architecture used Samsung's 8nm-based 8N process node from 2018, which was two years old by the time of Ampere's launch.^[17]^[18] The AD102 die with its 76.3 billion transistors has a transistor density of 125.5 million per mm², a 178% increase in density from GA102's 45.1 million per mm².

Media engine

The Lovelace architecture utilizes the new 8th generation Nvidia NVENC video encoder and the 7th generation NVDEC video decoder introduced by Ampere returns.^[19]

NVENC AV1 hardware encoding with support for up to 8K resolution at 60FPS in 10-bit color is added, enabling higher video fidelity at lower bit rates compared to the H.264 and H.265 codecs.^[20] Nvidia claims that its NVENC AV1 encoder featured in the Lovelace architecture is 40% more efficient than the H.264 encoder in the Ampere architecture.^[21]

The Lovelace architecture received criticism for not supporting the DisplayPort 2.0 connection that supports higher display data bandwidth and instead uses the older DisplayPort 1.4a which is limited to a peak bandwidth of 32Gbps.^[22] As a result, Lovelace GPUs would be limited by DisplayPort 1.4a's supported refresh rates despite the GPU's performance being able to reach higher frame rates. Intel's Arc GPUs that also released in October 2022 included DisplayPort 2.0. AMD's competing RDNA 3 architecture released just two months after Lovelace included DisplayPort 2.1.^[23]

Ada Lovelace dies

Comparison of Ada Lovelace chips
Chip^[24]	AD102^[25]	AD103^[26]	AD104^[27]	AD106^[28]	AD107^[29]
Die size	609 mm²	379 mm²	294 mm²	188 mm²	159 mm²
Transistors	76.3B	45.9B	35.8B	22.9B	18.9B
Transistor density	125.3 MTr/mm²	121.1 MTr/mm²	121.8 MTr/mm²	121.8 MTr/mm²	118.9 MTr/mm²
Graphics processing clusters (GPC)	12	7	5	3	2
Streaming multiprocessors (SM)	144	80	60	36	24
CUDA cores	18432	10240	7680	4608	3072
Texture mapping units	576	320	240	144	96
Render output units	192	112	80	48	48
Tensor cores	576	320	240	144	96
RT cores	144	80	60	36	24
L1 cache	18Шаблон:NbspMB	10Шаблон:NbspMB	7.5Шаблон:NbspMB	4.5Шаблон:NbspMB	3Шаблон:NbspMB
L1 cache	128Шаблон:NbspKB per SM
L2 cache	96Шаблон:NbspMB	64Шаблон:NbspMB	48Шаблон:NbspMB	32Шаблон:NbspMB

Ada Lovelace-based products

Gaming

GeForce 40 series
- GeForce RTX 4050 (mobile) (AD107)
- GeForce RTX 4060 (mobile) (AD107)
- GeForce RTX 4060 Ti (AD106)
- GeForce RTX 4070 (mobile) (AD106)
- GeForce RTX 4070 (AD104)
- GeForce RTX 4070 Ti (AD104)
- GeForce RTX 4080 (mobile) (AD104)
- GeForce RTX 4080 (AD103)
- GeForce RTX 4090 (mobile) (AD103)
- GeForce RTX 4090 (AD102)

Professional

Desktop Workstation

Шаблон:Row hover highlight

Model	Launch	Launch MSRP (USD)	Code name(s)	rowspan="2" Шаблон:Vert header	Die size	Core configШаблон:Efn	SM countШаблон:Efn	Cache		Clock speedsШаблон:Efn		Fillrate Шаблон:Efn Шаблон:Efn		Memory				Processing power (TFLOPS)				TDP
Model	Launch	Launch MSRP (USD)	Code name(s)	L1	Die size	Core configШаблон:Efn	SM countШаблон:Efn	L2	Core clock (MHz)	Memory (Gb/s)	Pixel (Gpx/s)	Texture (Gtex/s)	Type	Size	Bandwidth (GB/s)	Bus width	Half precision (boost)	Single precision (boost)	Double precision (boost)	Tensor compute [sparse]		TDP

Шаблон:Nowrap	Шаблон:Dts	$1,250	AD104-400	35.8	294.5Шаблон:Nbspmm²	6144 192:80:48:192	48	6Шаблон:NbspMB	48Шаблон:NbspMB	1290 (1565)	16 Gbps	103.2 (125.2)	247.68 (300.48)	GDDR6	20Шаблон:NbspGB	320	160-bit		(19.2)		153.4 [306.8]	70Шаблон:NbspW
Шаблон:Nowrap	Шаблон:Dts	$6,799	AD102-300	76.3	608.4Шаблон:Nbspmm²	18,176 568:192:142:568	142	17.75Шаблон:NbspMB	96Шаблон:NbspMB	915 (2505)	20 Gbps	175.68 (480.96)	519.72 (1,422.84)	GDDR6	48Шаблон:NbspGB	960	384-bit		(91.1)		728.5 [1457.0]	300Шаблон:NbspW

Шаблон:Notelist

Mobile Workstation

Model	Launch	Code name(s)	rowspan="2" Шаблон:Vert header	Die size	Core configШаблон:Efn	SM countШаблон:Efn	Cache		Clock speedsШаблон:Efn		Fillrate Шаблон:Efn Шаблон:Efn		Memory				Processing power (TFLOPS)				TGP
Model	Launch	Code name(s)	L1	Die size	Core configШаблон:Efn	SM countШаблон:Efn	L2	Core clock (MHz)	Memory (Gb/s)	Pixel (Gpx/s)	Texture (Gtex/s)	Type	Size	Bandwidth (GB/s)	Bus width	Half precision (boost)	Single precision (boost)	Double precision (boost)	Tensor compute [sparse]		TGP

Шаблон:Nowrap	Шаблон:Dts	AD107		146Шаблон:Nbspmm²	3072 96:32:24:96	24	3Шаблон:NbspMB	12Шаблон:NbspMB	930 (1455)	14 Gbps	29.76 (46.56)	89.28 (139.68)	GDDR6	8Шаблон:NbspGB	224	128-bit					35Шаблон:NbspW
Шаблон:Nowrap		AD107		146Шаблон:Nbspmm²	3072 96:32:24:96	24	3Шаблон:NbspMB	12Шаблон:NbspMB	1635 (2115)	16 Gbps	52.32 (67.68)	156.96 (203.04)			256			(14.5)		115.8 [231.6]	35–140Шаблон:NbspW
Шаблон:Nowrap		AD106	22.9	190Шаблон:Nbspmm²	4608 144:48:36:144	36	4.5Шаблон:NbspMB	32Шаблон:NbspMB	1395 (1695)	16 Gbps	66.96 (81.36)	200.88 (244.08)			256			(19.9)		159.3 [318.6]	35–140Шаблон:NbspW
Шаблон:Nowrap		AD104	35.8	294.5Шаблон:Nbspmm²	5120 160:64:40:160	40	5Шаблон:NbspMB	48Шаблон:NbspMB	1290 (1665)	18 Gbps	82.56 (106.56)	206.4 (266.4)		12Шаблон:NbspGB	432	192-bit		(23.0)		184.3 [368.6]	60–140Шаблон:NbspW
Шаблон:Nowrap		AD104	35.8	294.5Шаблон:Nbspmm²	7424 232:80:58:232	58	7.25Шаблон:NbspMB	48Шаблон:NbspMB	1290 (1665)		103.2 (133.2)	299.28 (386.28)		12Шаблон:NbspGB	432	192-bit		(33.6)		269.0 [538.0]	80–175Шаблон:NbspW
Шаблон:Nowrap		AD103	45.9	378.6Шаблон:Nbspmm²	9728 304:112:76:304	76	9.5Шаблон:NbspMB	64Шаблон:NbspMB	1335 (1695)		149.52 (189.84)	405.84 (515.28)		16Шаблон:NbspGB	576	256-bit		(42.6)		340.9 [681.8]	80–175Шаблон:NbspW

Шаблон:Notelist

Datacenter

Model	Launch	Launch MSRP (USD)	Code name(s)	rowspan="2" Шаблон:Vert header	Die size	Core configШаблон:Efn	SM countШаблон:Efn	Cache		Clock speedsШаблон:Efn		Fillrate Шаблон:Efn Шаблон:Efn		Memory				Processing power (TFLOPS)			TBP
Model	Launch	Launch MSRP (USD)	Code name(s)	L1	Die size	Core configШаблон:Efn	SM countШаблон:Efn	L2	Core clock (MHz)	Memory (MHz)	Pixel (Gpx/s)	Texture (Gtex/s)	Type	Size	Bandwidth (GB/s)	Bus width	Half precision (boost)	Single precision (boost)	Double precision (boost)	Tensor compute [sparse]	TBP

Шаблон:Nowrap	Шаблон:Dts	$	AD104-???-A1	35.8	295Шаблон:Nbspmm²	7,680 240:80:60:240	60	7.5Шаблон:NbspMB	48Шаблон:NbspMB	795 (2040)	1313	63.6 (163.2)	190.8 (489.6)	GDDR6X	24Шаблон:NbspGB	504.2	192-bit				285Шаблон:NbspW
Шаблон:Nowrap	Шаблон:Dts	$	AD102-895-A1	76.3	608.4Шаблон:Nbspmm²	18,176 568:192:142:568	142	17.75Шаблон:NbspMB	96Шаблон:NbspMB	735 (2490)	2250	58.8 (199.2)	176.4 (597.6)	GDDR6	48Шаблон:NbspGB	864	384-bit				300Шаблон:NbspW
Шаблон:Nowrap		$	AD102-???-A1						48Шаблон:NbspMB	1005 (2475)		80.4 (198.0)	241.2 (594.0)		24Шаблон:NbspGB
Шаблон:Nowrap		$	AD102-???-A1						48Шаблон:NbspMB	1005 (2475)		80.4 (198.0)	241.2 (594.0)		24Шаблон:NbspGB

Шаблон:Notelist

References

Шаблон:Reflist

Шаблон:Nvidia

[1] Шаблон:Cite web

[Mujtaba-2] Шаблон:Cite news

[3] Шаблон:Cite press release

[4] Шаблон:Cite web

[Machkovec-5] Шаблон:Cite news

[Chiappetta-6] 6,0 ^6,1 ^6,2 Шаблон:Cite web

[7] Шаблон:Cite web

[8] Шаблон:Cite web

[9] Шаблон:Cite web

[10] Шаблон:Cite web

[11] Шаблон:Cite web

[12] Шаблон:Cite web

[13] Шаблон:Cite news

[14] Шаблон:Cite web

[15] Шаблон:Cite web

[16] Шаблон:Cite web

[17] Шаблон:Cite web

[18] Шаблон:Cite web

[19] Шаблон:Cite web

[20] Шаблон:Cite web

[21] Шаблон:Cite web

[22] Шаблон:Cite web

[23] Шаблон:Cite web

[24] Шаблон:Cite web

[25] Шаблон:Cite web

[26] Шаблон:Cite web

[27] Шаблон:Cite web

[28] Шаблон:Cite web

[29] Шаблон:Cite web

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

Партнерские ресурсы
Криптовалюты	Обмен криптовалют - www.bestchange.ru Криптовалютная биржа CoinEx Криптовалютная биржа Binance HIVE OS - операционная система для майнинга e4pool - Мультивалютный пул для майнинга.
Магазины	AliExpress — глобальная виртуальная (в Интернете) торговая площадка, предоставляющая возможность покупать товары производителей из КНР; computeruniverse.net - Интернет-магазин компьютеров(Промо код 5 Евро на первую покупку:FWWC3ZKQ);
Хостинг	DigitalOcean - американский провайдер облачных инфраструктур, с главным офисом в Нью-Йорке и с центрами обработки данных по всему миру;
Разное	Викиум - Онлайн-тренажер для мозга Like Центр - Центр поддержки и развития предпринимательства. Gamersbay - лучший магазин по бустингу для World of Warcraft. Ноотропы OmniMind N°1 - Усиливает мозговую активность. Повышает мотивацию. Улучшает память. Санкт-Петербургская школа телевидения - это федеральная сеть образовательных центров, которая имеет филиалы в 37 городах России. Lingualeo.com — интерактивный онлайн-сервис для изучения и практики английского языка в увлекательной игровой форме. Junyschool (Джунискул) – международная школа программирования и дизайна для детей и подростков от 5 до 17 лет, где ученики осваивают компьютерную грамотность, развивают алгоритмическое и креативное мышление, изучают основы программирования и компьютерной графики, создают собственные проекты: игры, сайты, программы, приложения, анимации, 3D-модели, монтируют видео. Умназия - Интерактивные онлайн-курсы и тренажеры для развития мышления детей 6-13 лет SkillBox - это один из лидеров российского рынка онлайн-образования. Среди партнеров Skillbox ведущий разработчик сервисного дизайна AIC, медиа-компания Yoola, первое и самое крупное русскоязычное аналитическое агентство Tagline, онлайн-школа дизайна и иллюстрации Bang! Bang! Education, оператор PR-рынка PACO, студия рисования Draw&Go, агентство performance-маркетинга Ingate, scrum-студия Sibirix, имидж-лаборатория Персона. «Нетология» — это университет по подготовке и дополнительному обучению специалистов в области интернет-маркетинга, управления проектами и продуктами, дизайна, Data Science и разработки. В рамках Нетологии студенты получают ценные теоретические знания от лучших экспертов Рунета, выполняют практические задания на отработку полученных навыков, общаются с экспертами и единомышленниками. Познакомиться со всеми продуктами подробнее можно на сайте https://netology.ru, линейка курсов и профессий постоянно обновляется. StudyBay Brazil – это онлайн биржа для португалоговорящих студентов и авторов! Студент получает уникальную работу любого уровня сложности и больше свободного времени, в то время как у автора появляется дополнительный заработок и бесценный опыт. Автор24 — самая большая в России площадка по написанию учебных работ: контрольные и курсовые работы, дипломы, рефераты, решение задач, отчеты по практике, а так же любой другой вид работы. Сервис сотрудничает с более 70 000 авторов. Более 1 000 000 работ уже выполнено. StudyBay – это онлайн биржа для англоязычных студентов и авторов! Студент получает уникальную работу любого уровня сложности и больше свободного времени, в то время как у автора появляется дополнительный заработок и бесценный опыт.

Английская Википедия:Ada Lovelace (microarchitecture)

Содержание

Background

Architectural details

Streaming multiprocessors (SMs)

CUDA cores

RT cores

Tensor cores

Clock speeds

Cache and memory subsystem

Power efficiency and process node

Media engine

Ada Lovelace dies

Ada Lovelace-based products

Gaming

Professional

Desktop Workstation

Mobile Workstation

Datacenter

See also

References

Навигация

Действия на странице

Действия на странице

Персональные инструменты

Навигация

Поиск

Инструменты