Английская Википедия:Dplyr

Материал из Онлайн справочника
Версия от 04:28, 29 февраля 2024; EducationBot (обсуждение | вклад) (Новая страница: «{{Английская Википедия/Панель перехода}} {{Short description|R package}} {{Context|date=September 2020}} {{lowercase title}} {{Infobox software | name = dplyr | title = dplyr | logo = | logo caption = | screenshot = <!-- [[File: ]] --> | caption = | collapsible = | author = Hadley Wickham | developer = | released = {{Start date and age|2014|01|07}} | discontinued = | latest release version = 1.1.0 | latest release...»)
(разн.) ← Предыдущая версия | Текущая версия (разн.) | Следующая версия → (разн.)
Перейти к навигацииПерейти к поиску

Шаблон:Short description Шаблон:Context

Шаблон:Lowercase title

Шаблон:Infobox software

One of the core packages of the tidyverse in the R programming language, dplyr is primarily a set of functions designed to enable dataframe manipulation in an intuitive, user-friendly way. Data analysts typically use dplyr in order to transform existing datasets into a format better suited for some particular type of analysis, or data visualization.[1][2]

For instance, someone seeking to analyze an enormous dataset may wish to only view a smaller subset of the data. Alternatively, a user may wish to rearrange the data in order to see the rows ranked by some numerical value, or even based on a combination of values from the original dataset.

Authored primarily by Hadley Wickham, dplyr was launched in 2014.[3] On the dplyr web page, the package is described as "a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges."[4]

The five core verbs

While dplyr actually includes several dozen functions that enable various forms of data manipulation, the package features five primary verbs:[5]

filter(), which is used to extract rows from a dataframe, based on conditions specified by a user;

select(), which is used to subset a dataframe by its columns;

arrange(), which is used to sort rows in a dataframe based on attributes held by particular columns;

mutate(), which is used to create new variables, by altering and/or combining values from existing columns; and

summarize(), also spelled summarise(), which is used to collapse values from a dataframe into a single summary.

Additional functions

In addition to its five main verbs, dplyr also includes several other functions that enable exploration and manipulation of dataframes. Included among these are:

count(), which is used to sum the number of unique observations that contain some particular value or categorical attribute;

rename(), which enables a user to alter the column names for variables, often to improve ease of use and intuitive understanding of a dataset;

slice_max(), which returns a data subset that contains the rows with the highest number of values for some particular variable;

slice_min(), which returns a data subset that contains the rows with the lowest number of values for some particular variable.

Built-in datasets

The dplyr package comes with five datasets. These are: band_instruments, band_instruments2, band_members, starwars, storms.        

Copyright & license

The copyright to dplyr is held by Posit PBC, formerly RStudio PBC. Dplyr was originally released under a GPL licenseШаблон:Citation needed, but in 2022 Posit changed the license terms for the package to the "more permissive" MIT License.[6] The chief difference between the two types of license is that the MIT license allows subsequent re-use of code within proprietary software, whereas a GPL license does not.

References

Шаблон:Reflist