Английская Википедия:Code stylometry

Code stylometry (also known as program authorship attribution or source code authorship analysis) is the application of stylometry to computer code to attribute authorship to anonymous binary or source code. It often involves breaking down and examining the distinctive patterns and characteristics of the programming code and then comparing them to computer code whose authorship is known.^[1] Unlike software forensics, code stylometry attributes authorship for purposes other than intellectual property infringement, including plagiarism detection, copyright investigation, and authorship verification.^[2]

History

In 1989, researchers Paul Oman and Curtis Cook identified the authorship of 18 different Pascal programs written by six authors by using “markers” based on typographic characteristics.^[3]

In 1998, researchers Stephen MacDonell, Andrew Gray, and Philip Sallis developed a dictionary-based author attribution system called IDENTIFIED (Integrated Dictionary-based Extraction of Non-language-dependent Token Information for Forensic Identification, Examination, and Discrimination) that determined the authorship of source code in computer programs written in C++. The researchers noted that authorship can be identified using degrees if flexibility in the writing style of the source code, such as:^[4]

The way the algorithm in the source code solves the given problem
The way the source code is laid out (spacing, indentation, bordering characteristics, standard headings, etc.)
The way the algorithm is implemented in the source code

The IDENTIFIED system attributed authorship by first merging all the relevant files to produce a single source code file and then subjecting it to a metrics analysis by counting the number of occurrences for each metric. In addition, the system was language-independent due to its ability to create new dictionary files and meta-dictionaries.^[4]

In 1999, a team of researchers led by Stephen MacDonell tested the performance of three different program authorship discrimination techniques on 351 programs written in C++ by 7 different authors. The researchers compared the effectiveness of using a feed-forward neural network (FFNN) that was trained on a back-propagation algorithm, multiple discriminant analysis (MDA), and case-based reasoning (CBR). At the end of the experiment, both the neural network and the MDA had an accuracy rate of 81.1%, while the CBR reached an accuracy performance of 88.0%.^[5]

In 2005, researchers from the Laboratory of Information and Communication Systems Security at Aegean University introduced a language-independent method of program authorship attribution where they used byte-level n-grams to classify a program to an author. This technique scanned the files and then created a table of different n-grams found in the source code and the number of times they appear. In addition, the system could operate with limited numbers of training examples from each author. However, the more source code programs that were present for each author, the more reliable the author attribution. In an experiment testing their approach, the researchers found that classification using n-grams reached an accuracy rate of up to 100%, although the rate declined drastically if the profile size exceeded 500 and the n-gram size was 3 or less.^[3]

In 2011, researchers from the University of Wisconsin created a program authorship attribution system that identified a programmer based on the binary code of a program instead of the source code. The researchers utilized machine learning and training code to determine which characteristics of the code would be helpful in describing the programming style. In an experiment testing the approach on a set of programs written by 10 different authors, the system achieved an accuracy rate of 81%. When tested using a set of programs written by almost 200 different authors, the system performed with an accuracy rate of 51%.^[6]

In 2015, a team of postdoctoral researchers from Princeton University, Drexel University, the University of Maryland, and the University of Goettingen as well as researchers from the U.S. Army Research Laboratory developed a program authorship attribution system that could determine the author of a program from a sample pool with programs written by 1,600 coders with a 94 percent accuracy. The methodology consisted of four steps:^[7]

Disassembly - The program is disassembled to obtain information on its characteristics.
Decompilation - The program is converted into a variant of C-like pseudocode through decompilation to obtain abstract syntax trees.
Dimensionality reduction - The most relevant and useful features for author identification are selected.
Classification - A random-forest classifier attributes the authorship of the program.

This approach analyzed various characteristics of the code, such as blank space, the use of tabs and spaces, and the names of variables, and then used a method of evaluation called a syntax tree analysis that translated the sample code into tree-like diagrams that displayed the structural decisions involved in writing the code. The design of these diagrams prioritized the order of the commands and the depths of the functions that were nestled in the code.^[8]

The 2014 Sony Pictures hacking attack

U.S. intelligence officials were able to determine that the 2014 cyber attack on Sony Pictures was sponsored by North Korea after evaluating the software, techniques, and network sources. The attribution was made after cybersecurity experts noticed similarities between the code used in the attack and a malicious software known as Shamoon, which was used in the 2013 attacks against South Korean banks and broadcasting companies by North Korea.^[9]

References

Шаблон:Reflist

[1] Шаблон:Cite news

[2] Шаблон:Cite book

[Frantzeskou_2005-3] 3,0 ^3,1 Шаблон:Cite book

[Gray_1998-4] 4,0 ^4,1 Шаблон:Cite book

[5] Шаблон:Cite journal

[6] Шаблон:Cite journal

[7] Шаблон:Cite news

[8] Шаблон:Cite news

[9] Шаблон:Cite news

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

Партнерские ресурсы
Криптовалюты	Обмен криптовалют - www.bestchange.ru Криптовалютная биржа CoinEx Криптовалютная биржа Binance HIVE OS - операционная система для майнинга e4pool - Мультивалютный пул для майнинга.
Магазины	AliExpress — глобальная виртуальная (в Интернете) торговая площадка, предоставляющая возможность покупать товары производителей из КНР; computeruniverse.net - Интернет-магазин компьютеров(Промо код 5 Евро на первую покупку:FWWC3ZKQ);
Хостинг	DigitalOcean - американский провайдер облачных инфраструктур, с главным офисом в Нью-Йорке и с центрами обработки данных по всему миру;
Разное	Викиум - Онлайн-тренажер для мозга Like Центр - Центр поддержки и развития предпринимательства. Gamersbay - лучший магазин по бустингу для World of Warcraft. Ноотропы OmniMind N°1 - Усиливает мозговую активность. Повышает мотивацию. Улучшает память. Санкт-Петербургская школа телевидения - это федеральная сеть образовательных центров, которая имеет филиалы в 37 городах России. Lingualeo.com — интерактивный онлайн-сервис для изучения и практики английского языка в увлекательной игровой форме. Junyschool (Джунискул) – международная школа программирования и дизайна для детей и подростков от 5 до 17 лет, где ученики осваивают компьютерную грамотность, развивают алгоритмическое и креативное мышление, изучают основы программирования и компьютерной графики, создают собственные проекты: игры, сайты, программы, приложения, анимации, 3D-модели, монтируют видео. Умназия - Интерактивные онлайн-курсы и тренажеры для развития мышления детей 6-13 лет SkillBox - это один из лидеров российского рынка онлайн-образования. Среди партнеров Skillbox ведущий разработчик сервисного дизайна AIC, медиа-компания Yoola, первое и самое крупное русскоязычное аналитическое агентство Tagline, онлайн-школа дизайна и иллюстрации Bang! Bang! Education, оператор PR-рынка PACO, студия рисования Draw&Go, агентство performance-маркетинга Ingate, scrum-студия Sibirix, имидж-лаборатория Персона. «Нетология» — это университет по подготовке и дополнительному обучению специалистов в области интернет-маркетинга, управления проектами и продуктами, дизайна, Data Science и разработки. В рамках Нетологии студенты получают ценные теоретические знания от лучших экспертов Рунета, выполняют практические задания на отработку полученных навыков, общаются с экспертами и единомышленниками. Познакомиться со всеми продуктами подробнее можно на сайте https://netology.ru, линейка курсов и профессий постоянно обновляется. StudyBay Brazil – это онлайн биржа для португалоговорящих студентов и авторов! Студент получает уникальную работу любого уровня сложности и больше свободного времени, в то время как у автора появляется дополнительный заработок и бесценный опыт. Автор24 — самая большая в России площадка по написанию учебных работ: контрольные и курсовые работы, дипломы, рефераты, решение задач, отчеты по практике, а так же любой другой вид работы. Сервис сотрудничает с более 70 000 авторов. Более 1 000 000 работ уже выполнено. StudyBay – это онлайн биржа для англоязычных студентов и авторов! Студент получает уникальную работу любого уровня сложности и больше свободного времени, в то время как у автора появляется дополнительный заработок и бесценный опыт.

Английская Википедия:Code stylometry

History

The 2014 Sony Pictures hacking attack

References

Навигация

Действия на странице

Действия на странице

Персональные инструменты

Навигация

Поиск

Инструменты