Английская Википедия:Hopper (microarchitecture)

Шаблон:Short description Шаблон:Use mdy dates Шаблон:Infobox GPU microarchitecture Hopper is a graphics processing unit (GPU) microarchitecture developed by Nvidia. It is designed for datacenters and is parallel to Ada Lovelace. It's the latest generation of Nvidia Tesla.

Named for computer scientist and United States Navy rear admiral Grace Hopper, the Hopper architecture was leaked in November 2019 and officially revealed in March 2022. It improves upon its predecessors, the Turing and Ampere microarchitectures, featuring a new streaming multiprocessor and a faster memory subsystem.

Architecture

The Nvidia Hopper H100 GPU is implemented using the TSMC 4N process with 80 billion transistors. It consists of up to 144 streaming multiprocessors.Шаблон:Sfn In SXM5, the Nvidia Hopper H100 offers better performance than PCIe.Шаблон:Sfn

Streaming multiprocessor

The streaming multiprocessors for Hopper improve upon the Turing and Ampere microarchitectures, although the maximum number of concurrent warps per streaming multiprocessor (SM) remains the same between the Ampere and Hopper architectures, 64.Шаблон:Sfn The Hopper architecture provides a Tensor Memory Accelerator (TMA), which supports bidirectional asynchronous memory transfer between shared memory and global memory.Шаблон:Sfn Under TMA, applications may transfer up to 5D tensors. When writing from shared memory to global memory, elementwise reduction and bitwise operators may be used, avoiding registers and SM instructions while enabling users to write warp specialized codes. TMA is exposed through cuda::memcpy_asyncШаблон:Sfn

When parallelizing applications, developers can use thread block clusters. Thread blocks may perform atomics in the shared memory of other thread blocks within its cluster, otherwise known as distributed shared memory. Distributed shared memory may be used by an SM simultaneously with L2 cache; when used to communicate data between SMs, this can utilize the combined bandwidth of distributed shared memory and L2. The maximum portable cluster size is 8, although the Nvidia Hopper H100 can support a cluster size of 16 by using the cudaFuncAttributeNonPortableClusterSizeAllowed function, potentially at the cost of reduced number of active blocks.Шаблон:Sfn With L2 multicasting and distributed shared memory, the required bandwidth for dynamic random-access memory read and writes is reduced.^[1]

Hopper features improved single-precision floating-point format (FP32) throughput with twice as many FP32 operations per cycle per SM than its predecessor. Additionally, the Hopper architecture adds support for new instructions, including the Smith–Waterman algorithm.Шаблон:Sfn Like Ampere, TensorFloat-32 (TF-32) arithmetic is supported. The mapping pattern for both architectures is identical.Шаблон:Sfn

Memory

The Nvidia Hopper H100 supports HBM3 and HBM2e memory up to 80 GB; the HBM3 memory system supports 3 TB/s, an increase of 50% over the Nvidia Ampere A100's 2 TB/s. Across the architecture, the L2 cache capacity and bandwidth were increased.Шаблон:Sfn

Hopper allows CUDA compute kernels to utilize automatic inline compression, including in individual memory allocation, which allows accessing memory at higher bandwidth. This feature does not increase the amount of memory available to the application, because the data (and thus it's compressibility) may be changed at any time. The compressor will automatically choose between several compression algorithms.Шаблон:Sfn

The Nvidia Hopper H100 increases the capacity of the combined L1 cache, texture cache, and shared memory to 256 KB. Like its predecessors, it combines L1 and texture caches into a unified cache designed to be a coalescing buffer. The attribute cudaFuncAttributePreferredSharedMemoryCarveout may be used to define the carveout of the L1 cache. Hopper introduces enhancements to NVLink through a new generation with faster overall communication bandwidth.Шаблон:Sfn

Memory synchronization domains

Some CUDA applications may experience interference when performing fence or flush operations due to memory ordering. Because the GPU cannot know which writes are guaranteed and which are visible by chance timing, it may wait on unnecessary memory operations, thus slowing down fence or flush operations. For example, when a kernel performs computations in GPU memory and a parallel kernel performs communications with a peer, the local kernel will flush its writes, resulting in slower NVLink or PCIe writes. In the Hopper architecture, the GPU can reduce the net cast through a fence operation.Шаблон:Sfn

DPX instructions

The Hopper architecture math application programming interface (API) exposes functions in the SM such as __viaddmin_s16x2_relu, which performs the per-halfword <math>max(min(a + b, c), 0)</math>. In the Smith–Waterman algorithm, __vimax3_s16x2_relu can be used, a three-way min or max followed by a clamp to zero.^[2] Similarly, Hopper speeds up implementations of the Needleman–Wunsch algorithm.^[3]

Transformer engine

The Hopper architecture utilizes a transformer engine.^[4]

Power efficiency

The SXM5 form factor H100 has a thermal design power (TDP) of 700 watts. With regards to its asynchrony, the Hopper architecture may attain high degrees of utilization and thus may have a better performance-per-watt.Шаблон:Sfn

Physical

Hopper contains 35,000 parts and weighs Шаблон:Cvt.^[5]Шаблон:Dubious

Grace Hopper

Шаблон:Infobox GPU microarchitecture The GH200 combines a Hopper-based H200 GPU with a Grace-based 72-core CPU on a single module. The total power draw of the module is up to 1000 W. CPU and GPU are connected via NVLink, which provides memory coherence between CPU and GPU memory.^[6]

History

In November 2019, a well-known Twitter account posted a tweet revealing that the next architecture after Ampere would be called Hopper, named after computer scientist and United States Navy rear admiral Grace Hopper, one of the first programmers of the Harvard Mark I. The account stated that Hopper would be based on a multi-chip module design, which would result in a yield gain with lower wastage.^[7]

During the 2022 Nvidia GTC, Nvidia officially announced Hopper.^[8] By 2023, during the AI boom, H100s were in great demand. Larry Ellison of Oracle Corporation said that year that at a dinner with Nvidia CEO Jensen Huang, he and Elon Musk of Tesla, Inc. and xAI "were begging" for H100s, "I guess is the best way to describe it. An hour of sushi and begging".^[9]

References

Citations

Шаблон:Reflist

Партнерские ресурсы
Криптовалюты	Обмен криптовалют - www.bestchange.ru Криптовалютная биржа CoinEx Криптовалютная биржа Binance HIVE OS - операционная система для майнинга e4pool - Мультивалютный пул для майнинга.
Магазины	AliExpress — глобальная виртуальная (в Интернете) торговая площадка, предоставляющая возможность покупать товары производителей из КНР; computeruniverse.net - Интернет-магазин компьютеров(Промо код 5 Евро на первую покупку:FWWC3ZKQ);
Хостинг	DigitalOcean - американский провайдер облачных инфраструктур, с главным офисом в Нью-Йорке и с центрами обработки данных по всему миру;
Разное	Викиум - Онлайн-тренажер для мозга Like Центр - Центр поддержки и развития предпринимательства. Gamersbay - лучший магазин по бустингу для World of Warcraft. Ноотропы OmniMind N°1 - Усиливает мозговую активность. Повышает мотивацию. Улучшает память. Санкт-Петербургская школа телевидения - это федеральная сеть образовательных центров, которая имеет филиалы в 37 городах России. Lingualeo.com — интерактивный онлайн-сервис для изучения и практики английского языка в увлекательной игровой форме. Junyschool (Джунискул) – международная школа программирования и дизайна для детей и подростков от 5 до 17 лет, где ученики осваивают компьютерную грамотность, развивают алгоритмическое и креативное мышление, изучают основы программирования и компьютерной графики, создают собственные проекты: игры, сайты, программы, приложения, анимации, 3D-модели, монтируют видео. Умназия - Интерактивные онлайн-курсы и тренажеры для развития мышления детей 6-13 лет SkillBox - это один из лидеров российского рынка онлайн-образования. Среди партнеров Skillbox ведущий разработчик сервисного дизайна AIC, медиа-компания Yoola, первое и самое крупное русскоязычное аналитическое агентство Tagline, онлайн-школа дизайна и иллюстрации Bang! Bang! Education, оператор PR-рынка PACO, студия рисования Draw&Go, агентство performance-маркетинга Ingate, scrum-студия Sibirix, имидж-лаборатория Персона. «Нетология» — это университет по подготовке и дополнительному обучению специалистов в области интернет-маркетинга, управления проектами и продуктами, дизайна, Data Science и разработки. В рамках Нетологии студенты получают ценные теоретические знания от лучших экспертов Рунета, выполняют практические задания на отработку полученных навыков, общаются с экспертами и единомышленниками. Познакомиться со всеми продуктами подробнее можно на сайте https://netology.ru, линейка курсов и профессий постоянно обновляется. StudyBay Brazil – это онлайн биржа для португалоговорящих студентов и авторов! Студент получает уникальную работу любого уровня сложности и больше свободного времени, в то время как у автора появляется дополнительный заработок и бесценный опыт. Автор24 — самая большая в России площадка по написанию учебных работ: контрольные и курсовые работы, дипломы, рефераты, решение задач, отчеты по практике, а так же любой другой вид работы. Сервис сотрудничает с более 70 000 авторов. Более 1 000 000 работ уже выполнено. StudyBay – это онлайн биржа для англоязычных студентов и авторов! Студент получает уникальную работу любого уровня сложности и больше свободного времени, в то время как у автора появляется дополнительный заработок и бесценный опыт.

Английская Википедия:Hopper (microarchitecture)

Содержание

Architecture

Streaming multiprocessor

Memory

Memory synchronization domains

DPX instructions

Transformer engine

Power efficiency

Physical

Grace Hopper

History

References

Citations

Works cited

Further reading

Навигация

Действия на странице

Действия на странице

Персональные инструменты

Навигация

Поиск

Инструменты