Skip to content

Do report deduplication after an Analysis run like what is done during the Store command #4042

@vodorok

Description

@vodorok

Currently, when one tries to store a report folder, and there are multiple same reports from a header file, only one piece of report will be kept and stored of all the identical reports.

This can be extremely wasteful on large report folders when the raw report count is in the millions. It is also confusing when the store command uploads the report folder, it will print the raw report count which will be different to the report count that is actually stored.

See the following examples:
CodeChecker parse <results_folder>

----==== Checker Statistics ====----                                      
-------------------------------------------------------------------------                                                                           
Checker name                               | Severity | Number of reports                                                                           
------------------------------------------------------------------------- 
cppcheck-uninitMemberVar                   | MEDIUM   |                 4
cppcheck-invalidPrintfArgType_uint         | MEDIUM   |                 1
cppcheck-invalidPrintfArgType_sint         | MEDIUM   |                 2
cppcoreguidelines-virtual-class-destructor | MEDIUM   |                12
cppcoreguidelines-special-member-functions | LOW      |                17
bugprone-sizeof-expression                 | HIGH     |                 1 
-------------------------------------------------------------------------
----=================----            
                                     
----==== File Statistics ====----    
--------------------------------     
File name    | Number of reports                                          
--------------------------------                                          
tinyxml2.h   |                36                                          
tinyxml2.cpp |                 1                                          
--------------------------------                                          
----=================----      

Store result on GUI:
image

CodeChecker store <results_folder>

----==== Checker Statistics ====----                                                                                                                
-------------------------------------------------------------------------                                                                           
Checker name                               | Severity | Number of reports                                                                           
-------------------------------------------------------------------------                                                                           
cppcheck-uninitMemberVar                   | MEDIUM   |                 7                                                                           
cppcheck-invalidPrintfArgType_uint         | MEDIUM   |                 1                                                                           
cppcheck-invalidPrintfArgType_sint         | MEDIUM   |                 4
cppcoreguidelines-virtual-class-destructor | MEDIUM   |                24
cppcoreguidelines-special-member-functions | LOW      |                34                                                                           
bugprone-sizeof-expression                 | HIGH     |                 1
-------------------------------------------------------------------------
----=================----

----==== File Statistics ====----
--------------------------------
File name    | Number of reports
--------------------------------
tinyxml2.h   |                70
tinyxml2.cpp |                 1
--------------------------------
----=================----

It would be a beneficial feature to deduplicate identical reports during or after the analysis run. The same algorithm should be used that the store handler uses, to ensure the best compatibility.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions