Английская Википедия:80 Million Tiny Images

Материал из Онлайн справочника
Перейти к навигацииПерейти к поиску

Шаблон:Short description 80 Million Tiny Images is a dataset intended for training machine learning systems.[1] It contains 79,302,017 32×32 pixel color images, scaled down from images extracted from the World Wide Web in 2008 using automated web search queries on a set of 75,062 non-abstract nouns derived from WordNet. The words in the search terms were then used as labels for the images.[2] The researchers used seven web search resources for this purpose: Altavista, Ask.com, Flickr, Cydral, Google, Picsearch and Webshots.[2]

The 80 Million Tiny Images dataset was retired from use by its creators in 2020,[3] after a paper by researchers Abeba Birhane and Vinay Prabhu found that some of the labeling of several publicly available image datasets, including 80 Million Tiny Images, was causing models trained on them to exhibit racial and sexual bias.[4][5] They have asked other researchers not to use it for further research and to delete their copies of the dataset.[3]

The CIFAR-10 dataset uses a subset of the images in this dataset, but with independently generated labels.[6]

References

Шаблон:Reflist


Шаблон:Compsci-stub Шаблон:Sociology-stub