Английская Википедия:DreamBooth

Материал из Онлайн справочника
Перейти к навигацииПерейти к поиску

Шаблон:Short description Шаблон:Use mdy dates

Файл:Demonstration of DreamBooth AI model fine-tuning for Stable Diffusion using Jimmy Wales training data from Wikimedia Commons.png
Demonstration of the use of DreamBooth to fine-tune the Stable Diffusion v1.5 diffusion model, using training data obtained from Category:Jimmy Wales on Wikimedia Commons. Depicted here are algorithmically generated images of Jimmy Wales, co-founder of Wikipedia, performing bench press exercises at a fitness gym.

DreamBooth is a deep learning generation model used to personalize existing text-to-image models by fine-tuning. It was developed by researchers from Google Research and Boston University in 2022. Originally developed using Google's own Imagen text-to-image model, DreamBooth implementations can be applied to other text-to-image models, where it can allow the model to generate more fine-tuned and personalized outputs after training on three to five images of a subject.[1][2][3]

Technology

Pretrained text-to-image diffusion models, while often capable of offering a diverse range of different image output types, lack the specificity required to generate images of lesser-known subjects, and are limited in their ability to render known subjects in different situations and contexts.[1] The methodology used to run implementations of DreamBooth involves the fine-tuning the full UNet component of the diffusion model using a few images (usually 3--5) depicting a specific subject. Images are paired with text prompts that contain the name of the class the subject belongs to, plus a unique identifier. As an example, a photograph of a [Nissan R34 GTR] car, with car being the class); a class-specific prior preservation loss is applied to encourage the model to generate diverse instances of the subject based on what the model is already trained on for the original class.[1] Pairs of low-resolution and high-resolution images taken from the set of input images are used to fine-tune the super-resolution components, allowing the minute details of the subject to be maintained.[1]

Usage

DreamBooth can be used to fine-tune models such as Stable Diffusion, where it may alleviate a common shortcoming of Stable Diffusion not being able to adequately generate images of specific individual people.[4] Such a use case is quite VRAM intensive, however, and thus cost-prohibitive for hobbyist users.[4] The Stable Diffusion adaptation of DreamBooth in particular is released as a free and open-source project based on the technology outlined by the original paper published by Ruiz et. al. in 2022.[5] Concerns have been raised regarding the ability for bad actors to utilise DreamBooth to generate misleading images for malicious purposes, and that its open-source nature allows anyone to utilise or even make improvements to the technology.[6] In addition, artists have expressed their apprehension regarding the ethics of using DreamBooth to train model checkpoints that are specifically aimed at imitating specific art styles associated with human artists; one such critic is Hollie Mengert, an illustrator for Disney and Penguin Random House who has had her art style trained into a checkpoint model via DreamBooth and shared online, without her consent.[7][8]

References

Шаблон:Reflist

External links