Английская Википедия:Code property graph

Материал из Онлайн справочника
Версия от 02:20, 20 февраля 2024; EducationBot (обсуждение | вклад) (Новая страница: «{{Английская Википедия/Панель перехода}} {{Short description|Representation of a computer program}} In computer science, a '''code property graph''' (CPG) is a computer program representation that captures syntactic structure, control flow, and data dependencies in a property graph. The concept was originally introduced to identify security vulnera...»)
(разн.) ← Предыдущая версия | Текущая версия (разн.) | Следующая версия → (разн.)
Перейти к навигацииПерейти к поиску

Шаблон:Short description In computer science, a code property graph (CPG) is a computer program representation that captures syntactic structure, control flow, and data dependencies in a property graph. The concept was originally introduced to identify security vulnerabilities in C and C++ system code,[1] but has since been employed to analyze web applications,[2][3][4][5] cloud deployments,[6] and smart contracts.[7] Beyond vulnerability discovery, code property graphs find applications in code clone detection,[8][9] attack-surface detection,[10] exploit generation,[11] measuring code testability,[12] and backporting of security patches.[13]

Definition

A code property graph of a program is a graph representation of the program obtained by merging its abstract syntax trees (AST), control-flow graphs (CFG) and program dependence graphs (PDG) at statement and predicate nodes. The resulting graph is a property graph, which is the underlying graph model of graph databases such as Neo4j, JanusGraph and OrientDB where data is stored in the nodes and edges as key-value pairs. In effect, code property graphs can be stored in graph databases and queried using graph query languages.

Example

Consider the function of a C program:

void foo() {
  int x = source();
  if (x < MAX) {
    int y = 2 * x;
    sink(y);
  }
}

The code property graph of the function is obtained by merging its abstract syntax tree, control-flow graph, and program dependence graph at statements and predicates as seen in the following figure:

Code property graph of a sample C code snippet

Implementations

Joern CPG. The original code property graph was implemented for C/C++ in 2013 at University of Göttingen as part of the open-source code analysis tool Joern.[14] This original version has been discontinued and superseded by the open-source Joern Project,[15] which provides a formal code property graph specification[16] applicable to multiple programming languages. The project provides code property graph generators for C/C++, Java, Java bytecode, Kotlin, Python, JavaScript, TypeScript, LLVM bitcode, and x86 binaries (via the Ghidra disassembler).

Plume CPG. Developed at Stellenbosch University in 2020 and sponsored by Amazon Science, the open-source Plume[17] project provides a code property graph for Java bytecode compatible with the code property graph specification provided by the Joern project. The two projects merged in 2021.

Fraunhofer AISEC CPG. The Шаблон:Ill provides open-source code property graph generators for C/C++, Java, Golang, and Python,[18] albeit without a formal schema specification. It also provides the Cloud Property Graph,[19] an extension of the code property graph concept that models details of cloud deployments.

Galois’ CPG for LLVM. Galois Inc. provides a code property graph based on the LLVM compiler.[20] The graph represents code at different stages of the compilation and a mapping between these representations. It follows a custom schema that is defined in its documentation.

Machine learning on code property graphs

Code property graphs provide the basis for several machine-learning-based approaches to vulnerability discovery. In particular, graph neural networks (GNN) have been employed to derive vulnerability detectors.[21][22][23][24][25][26][27]

See also

References

Шаблон:Reflist