Английская Википедия:Enron Corpus

Шаблон:Short description The Enron Corpus is a database of over 600,000 emails generated by 158 employees^[1] of the Enron Corporation in the years leading up to the company's collapse in December 2001. The corpus was generated from Enron email servers by the Federal Energy Regulatory Commission (FERC) during its subsequent investigation.^[2] A copy of the email database was subsequently purchased for $10,000 by Andrew McCallum, a computer scientist at the University of Massachusetts Amherst.^[3] He released this copy to researchers, providing a trove of data that has been used for studies on social networking and computer-mediated communication.

Creation

In the legal investigation into Enron's collapse, the discovery process required collecting and preserving vast amounts of data, for which the FERC hired Aspen Systems (now part of Lockheed Martin). The emails were collected at Enron Corporation headquarters in Houston during two weeks in May 2002 by Joe Bartling,^[4] a litigation support and data analysis contractor for Aspen. In addition to the Enron employee emails, all of Enron's enterprise database systems,^[5] hosted in Oracle databases on Sun Microsystems servers, were captured and preserved, including its online energy trading platform, EnronOnline.

Once collected, the Enron emails were processed and hosted in proprietary electronic discovery platforms (first Concordance, then iCONECT) for review by investigators from the FERC, Commodity Futures Trading Commission, and Department of Justice. At the conclusion of the investigation, and upon the issuance of the FERC staff report,^[6] the emails and information collected were deemed to be in the public domain, to be used for historical research and academic purposes. The email archive was made publicly available and searchable via the web using iCONECT 24/7, but the sheer volume of email of over 160GB made it impractical to use. Copies of the collected emails and databases were made available on hard drives.

Jitesh Shetty and Jafar Adibi from the University of Southern California processed the data in 2004 and released a MySQL version.^[7] In 2010, EDRM.net published a revised and expanded version 2 of the corpus,^[8] containing over 1.7 million messages, which has been made available on Amazon S3 for easy access to the researchers.

Exploitation

Файл:Enron Email Network.jpg

A visualization of the email network in the Enron Corpus, with coloring representing eight communities

The corpus is valued as one of the few publicly available mass collections of real emails easily available for study; such collections are typically bound by numerous privacy and legal restrictions which render them prohibitively difficult to access, such as non-disclosure agreements and data sanitization.^[3] Shetty and Adibi, based on their MySQL version, published some link analysis of which user accounts emailed which.^[9] Linguistic comparison with more recent email corpora shows changes in the email register of English. It is also used as test or training data for research in natural language processing and machine learning.^[10]

References

Шаблон:Reflist

External links

Tutorial on data modeling with the Enron Corpus
Shetty and Adibi's enron email dataset download on S3 (178 MB)
Nathan Heller: What the Enron E-mails Say About Us The New Yorker, July 24, 2017
Searchable Enron Email Database (requires registration)
Open Test Search Searchable corpus of all email attachments used to compare different enterprise search engines.

Шаблон:Enron Шаблон:Corpus linguistics

↑ Шаблон:Cite CiteSeerX
↑ "The Enron Email Corpus Шаблон:Webarchive" Retrieved March 5, 2011.
↑ ^3,0 ^3,1 Markoff, John. "Armies of Expensive Lawyers, Replaced by Cheaper Software". New York Times March 5, 2011. p A1.
↑ Шаблон:Cite web
↑ Шаблон:Cite web
↑ FERC Staff Report - Price Manipulation in Western Markets - Findings at a Glance (3-26-2003)
↑ "Enron processed database"
↑ Шаблон:Cite web
↑ Шаблон:Cite book
↑ Шаблон:Cite book

[1] Шаблон:Cite CiteSeerX

[2] "The Enron Email Corpus Шаблон:Webarchive" Retrieved March 5, 2011.

[nyt-3] 3,0 ^3,1 Markoff, John. "Armies of Expensive Lawyers, Replaced by Cheaper Software". New York Times March 5, 2011. p A1.

[4] Шаблон:Cite web

[5] Шаблон:Cite web

[6] FERC Staff Report - Price Manipulation in Western Markets - Findings at a Glance (3-26-2003)

[7] "Enron processed database"

[8] Шаблон:Cite web

[9] Шаблон:Cite book

[10] Шаблон:Cite book

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

Партнерские ресурсы
Криптовалюты	Обмен криптовалют - www.bestchange.ru Криптовалютная биржа CoinEx Криптовалютная биржа Binance HIVE OS - операционная система для майнинга e4pool - Мультивалютный пул для майнинга.
Магазины	AliExpress — глобальная виртуальная (в Интернете) торговая площадка, предоставляющая возможность покупать товары производителей из КНР; computeruniverse.net - Интернет-магазин компьютеров(Промо код 5 Евро на первую покупку:FWWC3ZKQ);
Хостинг	DigitalOcean - американский провайдер облачных инфраструктур, с главным офисом в Нью-Йорке и с центрами обработки данных по всему миру;
Разное	Викиум - Онлайн-тренажер для мозга Like Центр - Центр поддержки и развития предпринимательства. Gamersbay - лучший магазин по бустингу для World of Warcraft. Ноотропы OmniMind N°1 - Усиливает мозговую активность. Повышает мотивацию. Улучшает память. Санкт-Петербургская школа телевидения - это федеральная сеть образовательных центров, которая имеет филиалы в 37 городах России. Lingualeo.com — интерактивный онлайн-сервис для изучения и практики английского языка в увлекательной игровой форме. Junyschool (Джунискул) – международная школа программирования и дизайна для детей и подростков от 5 до 17 лет, где ученики осваивают компьютерную грамотность, развивают алгоритмическое и креативное мышление, изучают основы программирования и компьютерной графики, создают собственные проекты: игры, сайты, программы, приложения, анимации, 3D-модели, монтируют видео. Умназия - Интерактивные онлайн-курсы и тренажеры для развития мышления детей 6-13 лет SkillBox - это один из лидеров российского рынка онлайн-образования. Среди партнеров Skillbox ведущий разработчик сервисного дизайна AIC, медиа-компания Yoola, первое и самое крупное русскоязычное аналитическое агентство Tagline, онлайн-школа дизайна и иллюстрации Bang! Bang! Education, оператор PR-рынка PACO, студия рисования Draw&Go, агентство performance-маркетинга Ingate, scrum-студия Sibirix, имидж-лаборатория Персона. «Нетология» — это университет по подготовке и дополнительному обучению специалистов в области интернет-маркетинга, управления проектами и продуктами, дизайна, Data Science и разработки. В рамках Нетологии студенты получают ценные теоретические знания от лучших экспертов Рунета, выполняют практические задания на отработку полученных навыков, общаются с экспертами и единомышленниками. Познакомиться со всеми продуктами подробнее можно на сайте https://netology.ru, линейка курсов и профессий постоянно обновляется. StudyBay Brazil – это онлайн биржа для португалоговорящих студентов и авторов! Студент получает уникальную работу любого уровня сложности и больше свободного времени, в то время как у автора появляется дополнительный заработок и бесценный опыт. Автор24 — самая большая в России площадка по написанию учебных работ: контрольные и курсовые работы, дипломы, рефераты, решение задач, отчеты по практике, а так же любой другой вид работы. Сервис сотрудничает с более 70 000 авторов. Более 1 000 000 работ уже выполнено. StudyBay – это онлайн биржа для англоязычных студентов и авторов! Студент получает уникальную работу любого уровня сложности и больше свободного времени, в то время как у автора появляется дополнительный заработок и бесценный опыт.

Английская Википедия:Enron Corpus

Содержание

Creation

Exploitation

References

External links

Навигация

Действия на странице

Действия на странице

Персональные инструменты

Навигация

Поиск

Инструменты