Английская Википедия:Datasaurus dozen / Онлайн справочник

Шаблон:Short description Шаблон:Data Visualization The Datasaurus dozen comprises thirteen data sets that have nearly identical simple descriptive statistics to two decimal places, yet have very different distributions and appear very different when graphed.^[1] It was inspired by the smaller Anscombe's quartet that was created in 1973.

Data

The following table contains summary statistics for all thirteen data sets.

Property	Value	Accuracy
Number of elements	142	exact
Mean of x	54.26	to 2 decimal places
Sample variance of x: sШаблон:Supsub	16.76	to 2 decimal places
Mean of y	47.83	to 2 decimal places
Sample variance of y: sШаблон:Supsub	26.93	to 2 decimal places
Correlation between x and y	−0.06	to 3 decimal places
Linear regression line	y = 53 − 0.1x	to 0 and 1 decimal places, respectively
Coefficient of determination of the linear regression: <math>R^2</math>	0.004	to 3 decimal places

thirteen graphs of the datasets in the Datasaurus Dozen, visualized graphically and also summarized numerically to show their statistical summaries are similar, while their graphical representations are not similar

The thirteen datasets in the Datasaurus Dozen, visualized and summarized

The thirteen data sets were labeled as the following:

away
bullseye
circle
dino
dots
h_lines
high_lines
slant_down
slant_up
star
v_line
wide_lines
x_shape

Similar to the Anscombe's quartet, the Datasaurus dozen was designed to further illustrate the importance of looking at a set of data graphically before starting to analyze according to a particular type of relationship, and the inadequacy of basic statistic properties for describing realistic datasets.^[2]^[3]^[4]^[5]^[1]^[6]

Creation

Файл:Datasaurus.png

The Datasaurus dataset created by Alberto Cairo that inspired the creation of the Datasaurus Dozen

The initial "datasaurus" dataset was constructed in 2016 by Alberto Cairo.^[7] It was proposed by Maarten Lambrechts that this dataset also be called "Anscombosaurus".^[7]

This dataset was then accompanied by twelve other datasets that were created by Justin Matejka and George Fitzmaurice at Autodesk. Unlike the Anscombe's quartet where it is not known how the data set was generated,^[8] it is known that the authors used simulated annealing to make these data sets. They made small, random, and biased changes to each point towards the desired shape. Each shape took 200,000 iterations of perturbations to complete.^[1]

The pseudocode for this algorithm is as follows:

current_ds ← initial_ds
for x iterations, do:
    test_ds ← perturb(current_ds, temp)
    if similar_enough(test_ds, initial_ds):
        current_ds ← test_ds

function perturb(ds, temp):
    loop:
        test ← move_random_points(ds)
        if fit(test) > fit(ds) or temp > random():
            return test

where

initial_ds is the seed dataset
current_ds is the latest version of the dataset
fit() is a function used to check whether moving the points gets closer to the desired shape
temp is the temperature of the simulated annealing algorithm0
similar_enough() is a function that checks whether the statistics for the two given datasets are similar enough
move_random_points() is a function that randomly moves data points

References

Шаблон:Reflist

External links

Animated examples from Autodesk for the Datasaurus Dozen datasets
datasauRus, datasets from the Datasaurus Dozen in R
The Datasaurus Dozen in CSV and tab-delimited files https://www.openintro.org/data/index.php?data=datasaurus

Партнерские ресурсы
Криптовалюты	Обмен криптовалют - www.bestchange.ru Криптовалютная биржа CoinEx Криптовалютная биржа Binance HIVE OS - операционная система для майнинга e4pool - Мультивалютный пул для майнинга.
Магазины	AliExpress — глобальная виртуальная (в Интернете) торговая площадка, предоставляющая возможность покупать товары производителей из КНР; computeruniverse.net - Интернет-магазин компьютеров(Промо код 5 Евро на первую покупку:FWWC3ZKQ);
Хостинг	DigitalOcean - американский провайдер облачных инфраструктур, с главным офисом в Нью-Йорке и с центрами обработки данных по всему миру;
Разное	Викиум - Онлайн-тренажер для мозга Like Центр - Центр поддержки и развития предпринимательства. Gamersbay - лучший магазин по бустингу для World of Warcraft. Ноотропы OmniMind N°1 - Усиливает мозговую активность. Повышает мотивацию. Улучшает память. Санкт-Петербургская школа телевидения - это федеральная сеть образовательных центров, которая имеет филиалы в 37 городах России. Lingualeo.com — интерактивный онлайн-сервис для изучения и практики английского языка в увлекательной игровой форме. Junyschool (Джунискул) – международная школа программирования и дизайна для детей и подростков от 5 до 17 лет, где ученики осваивают компьютерную грамотность, развивают алгоритмическое и креативное мышление, изучают основы программирования и компьютерной графики, создают собственные проекты: игры, сайты, программы, приложения, анимации, 3D-модели, монтируют видео. Умназия - Интерактивные онлайн-курсы и тренажеры для развития мышления детей 6-13 лет SkillBox - это один из лидеров российского рынка онлайн-образования. Среди партнеров Skillbox ведущий разработчик сервисного дизайна AIC, медиа-компания Yoola, первое и самое крупное русскоязычное аналитическое агентство Tagline, онлайн-школа дизайна и иллюстрации Bang! Bang! Education, оператор PR-рынка PACO, студия рисования Draw&Go, агентство performance-маркетинга Ingate, scrum-студия Sibirix, имидж-лаборатория Персона. «Нетология» — это университет по подготовке и дополнительному обучению специалистов в области интернет-маркетинга, управления проектами и продуктами, дизайна, Data Science и разработки. В рамках Нетологии студенты получают ценные теоретические знания от лучших экспертов Рунета, выполняют практические задания на отработку полученных навыков, общаются с экспертами и единомышленниками. Познакомиться со всеми продуктами подробнее можно на сайте https://netology.ru, линейка курсов и профессий постоянно обновляется. StudyBay Brazil – это онлайн биржа для португалоговорящих студентов и авторов! Студент получает уникальную работу любого уровня сложности и больше свободного времени, в то время как у автора появляется дополнительный заработок и бесценный опыт. Автор24 — самая большая в России площадка по написанию учебных работ: контрольные и курсовые работы, дипломы, рефераты, решение задач, отчеты по практике, а так же любой другой вид работы. Сервис сотрудничает с более 70 000 авторов. Более 1 000 000 работ уже выполнено. StudyBay – это онлайн биржа для англоязычных студентов и авторов! Студент получает уникальную работу любого уровня сложности и больше свободного времени, в то время как у автора появляется дополнительный заработок и бесценный опыт.

Английская Википедия:Datasaurus dozen

Содержание

Data

Creation

See also

References

External links

Навигация

Действия на странице

Действия на странице

Персональные инструменты

Навигация

Поиск

Инструменты