Skip to content
This repository was archived by the owner on Mar 30, 2021. It is now read-only.

Euro LLVM 17 Extended Abstract

Daniel Krupp edited this page Jan 26, 2017 · 54 revisions

Abstract

Today Clang SA can perform (context-sensitive) inter-procedural analysis for C,C++ and Objective C files by ''inlining'' the called function into the callers context. This means that that the full calling context (assumptions about function parameters, global variables) is passed when analyzing the called function and then the assumptions about the returned value is passed back to the caller. This works well for function calls within a translation unit (TU), but when the symbolic execution reaches a function that is implemented in another TU, the analyzer engine skips the analysis of the called function definition. In particular, assumptions about references and pointers passed as function parameter get invalidated, and the return value of the function will be unknown. Loosing precision this way may lead to false positive and false negative findings.

The cross translation unit (CTU) feature would allow the analysis of called functions even if the definition of the function is external to the currently analyzed TU. This would allow detection of bugs in library functions stemming from incorrect usage (e.g. a library assumes that the user will free a memory block allocated by the library), and allows for more precise analysis in general of the caller if a TU external function is invoked (by not loosing assumptions).

We implemented (based on the prototype by A. Sidorin et al. [1]) the Cross Translation Unit analysis feature for Clang SA (4.0) and evaluated its performance on various open source projects. In our presentation we show that with the CTU feature we found many new true positive reports and eliminated some false positives in real open source projects. We show that while the total analysis time increases to 2-3 times, the execution remains scalable to the number of processors. We also point out how the analysis coverage changes that may lead to the loss of reports compared to the non-CTU baseline.

Description of the CTU prototype

In this project we are working on a method which enables CTU analysis by inlining external function definitions using clang's existing ASTImporter (see http://clang.llvm.org/doxygen/classclang_1_1ASTImporter.html) functionality. One of our goals was to have minimal, isolated, and lightweight changes. We added less than 300 lines of code to the clang codebase (1200 lines of code including the support utilities that are separate executables and tests).

2-pass analysis

To perform the analysis we need to run clang on the whole source code two times. Cross Translation Analysis Overview 1st pass: The first analysis phase generates 3 important outputs: AST dump of all Tranlsation units, a map file for functions with external linkage, the global call flow graph. The AST binary (using the clang -cc1 -emit-pch http://clang.llvm.org/docs/PCHInternals.html) of each TU is generated into a temporary directory called preanalyze-dir. The mangled name and location of all externally linkable functions are collected into the function definition index (externalFnMap.txt). The global call graph of the program is stored in a file called (cfg.txt).

2nd pass: The Clang Static Analyzer is run for all translation units. When during the analysis a TU external function is reached, the location of the definition of that function is looked up from in the function definition index and the definition is imported from the containing AST binary into the caller's context using the ASTImpoter library.

The source code of the prototype is available in this clang Github fork: https://github.com/dkrupp/clang/tree/ctu-master

Results on Open Source projects

We have analyzed many open source projects using CTU and found many new true positive reports.

Analyzed project All Non-CTU Findings (baseline) All CTU Findings New CTU findings Disappeared findings Successfully analyzed Failed to analyze Analysis Time (baseline)[s] Analysis Time XTU (1st Phase + 2nd Phase)[s] Median of bug path length (BPL) in baseline Median of BPL CTU Median of BPL of new findings Median of BPL of disappeared findings
FFMpeg 339 399 101 41 1555 files 8 files 318 44+693 7 9 16 14
TMux 49 77 29 1 133 files 0 files 39 4+75 16 16 17 6
Curl 7 11 4 0 280 files 13 files 32 6.5+54 1 1 12 n.a.

Remaining problems to work on

  • CTU is great to test the ASTImporter. We found bugs where the AST is imported successfully but the resulting AST is not correct (assertion fails triggered by the analyzer, due to faulty import).
  • Uniquing needs to be done between different TUs.
  • What to do with too long paths?
  • Do we lose true positives and coverage? We plan to measure coverage. Different strategy to build the exploded graph?
  • The size of the dumped ASTs. The possibility to improve the situation.

Credits

This work is based on earlier work of Aleksei Sidorin , Artem Dergachev et al. See http://lists.llvm.org/pipermail/cfe-dev/2015-October/045730.html

References

[1] Cross-TU Analysis results on open source projects: http://cc.elte.hu/ [2] Artem Dergachev, Aleksei Sidorin Cross TU Analysis for Clang 3.4 http://lists.llvm.org/pipermail/cfe-dev/2015-October/045730.html

Clone this wiki locally