Skip to content

Conversation

@auroraberry
Copy link
Contributor

While investigating issue #168 , I found that Tai-e’s current handling of Arrays.copyOf is insufficient, which prevents correct analysis of the following code pattern:

String[] original = new String[0];
String[] copy = Arrays.copyOf(original, original.length + 1);
copy[copy.length - 1] = getSourceData();
sink(copy[copy.length - 1]);

Specifically, in the first new statement, Tai-e generates a mock object ZeroLengthArray( PR #140 ) and stores it in the points-to set of original. In the subsequent call to Arrays.copyOf, Tai-e propagates all objects from original’s points-to set to copy's. As a result, copy contains only the non-functional mock object ZeroLengthArray, which cannot hold array indexes. Consequently, copy[copy.length - 1] yields null, making correct array store/load analysis impossible.

To address this, I optimized the handling of Arrays.copyOf and added a corresponding testcase, which resolve the above issue.

@github-actions
Copy link

github-actions bot commented Aug 13, 2025

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@auroraberry
Copy link
Contributor Author

I have read the CLA Document and I hereby sign the CLA

@auroraberry
Copy link
Contributor Author

recheck

github-actions bot added a commit that referenced this pull request Aug 13, 2025
@codecov
Copy link

codecov bot commented Sep 3, 2025

Codecov Report

❌ Patch coverage is 94.87179% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.82%. Comparing base (12e7c43) to head (deba7a9).
⚠️ Report is 7 commits behind head on master.

Files with missing lines Patch % Lines
...l/taie/analysis/pta/plugin/natives/ArrayModel.java 94.44% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             master     #191       +/-   ##
=============================================
+ Coverage          0   75.82%   +75.82%     
- Complexity        0     4658     +4658     
=============================================
  Files             0      481      +481     
  Lines             0    16070    +16070     
  Branches          0     2199     +2199     
=============================================
+ Hits              0    12185    +12185     
- Misses            0     3018     +3018     
- Partials          0      867      +867     
Files with missing lines Coverage Δ
...taie/analysis/pta/core/heap/AbstractHeapModel.java 80.51% <100.00%> (ø)
...ie/analysis/pta/plugin/natives/NativeModeller.java 100.00% <100.00%> (ø)
...l/taie/analysis/pta/plugin/natives/ArrayModel.java 94.44% <94.44%> (ø)

... and 478 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cs-cat
Copy link
Collaborator

cs-cat commented Sep 9, 2025

This PR appears to introduce a precision regression (compared with the version that reverts the ZeroLengthArray optimization). The data on dacapo-2006/eclipse is as follows:

revert f4f1c5d

1-obj
-------------- Pointer analysis statistics: --------------
#var pointers:                18,9451 (insens) / 159,4362 (sens)
#objects:                     1,8524 (insens) / 2,1964 (sens)
#var points-to:               1329,3536 (insens) / 1,1117,1982 (sens)
#static field points-to:      1,7298 (sens)
#instance field points-to:    121,3577 (sens)
#array points-to:             33,1291 (sens)
#reachable methods:           2,4083 (insens) / 32,9939 (sens)
#call graph edges:            18,1271 (insens) / 1668,9122 (sens)
----------------------------------------

ci
-------------- Pointer analysis statistics: --------------
#var pointers:                19,2741 (insens) / 19,2741 (sens)
#objects:                     1,9495 (insens) / 1,9495 (sens)
#var points-to:               1995,7929 (insens) / 1995,7929 (sens)
#static field points-to:      2,1393 (sens)
#instance field points-to:    254,4160 (sens)
#array points-to:             35,7438 (sens)
#reachable methods:           2,4582 (insens) / 2,4582 (sens)
#call graph edges:            19,0358 (insens) / 19,0358 (sens)
----------------------------------------

This PR(86ef02d)

1-obj
-------------- Pointer analysis statistics: --------------
#var pointers:                18,9533 (insens) / 160,4090 (sens)
#objects:                     1,8548 (insens) / 2,2784 (sens)
#var points-to:               1336,9069 (insens) / 1,1480,5293 (sens)
#static field points-to:      1,7299 (sens)
#instance field points-to:    122,5662 (sens)
#array points-to:             45,2418 (sens)
#reachable methods:           2,4097 (insens) / 33,1895 (sens)
#call graph edges:            18,1278 (insens) / 1673,1125 (sens)
----------------------------------------

ci
-------------- Pointer analysis statistics: --------------
#var pointers:                19,2807 (insens) / 19,2807 (sens)
#objects:                     1,9543 (insens) / 1,9543 (sens)
#var points-to:               2009,2310 (insens) / 2009,2310 (sens)
#static field points-to:      2,1397 (sens)
#instance field points-to:    256,1075 (sens)
#array points-to:             36,0452 (sens)
#reachable methods:           2,4594 (insens) / 2,4594 (sens)
#call graph edges:            19,0032 (insens) / 19,0032 (sens)
----------------------------------------

This PR revert ZeroLengthArray optimization

1-obj
-------------- Pointer analysis statistics: --------------
#var pointers:                18,9633 (insens) / 160,7615 (sens)
#objects:                     1,8723 (insens) / 2,2944 (sens)
#var points-to:               1342,5532 (insens) / 1,1550,5344 (sens)
#static field points-to:      1,7301 (sens)
#instance field points-to:    122,6870 (sens)
#array points-to:             45,8850 (sens)
#reachable methods:           2,4111 (insens) / 33,3103 (sens)
#call graph edges:            18,1377 (insens) / 1684,5959 (sens)
----------------------------------------
ci
-------------- Pointer analysis statistics: --------------
#var pointers:                19,2838 (insens) / 19,2838 (sens)
#objects:                     1,9692 (insens) / 1,9692 (sens)
#var points-to:               2009,7506 (insens) / 2009,7506 (sens)
#static field points-to:      2,1413 (sens)
#instance field points-to:    255,9100 (sens)
#array points-to:             36,9637 (sens)
#reachable methods:           2,4599 (insens) / 2,4599 (sens)
#call graph edges:            19,0071 (insens) / 19,0071 (sens)
----------------------------------------

Copy link
Collaborator

@cs-cat cs-cat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge after the precision regression is addressed

@cs-cat
Copy link
Collaborator

cs-cat commented Sep 9, 2025

another regression experiment (ci PTA):
before patch (12e7c43)

-------------- Pointer analysis statistics: --------------
#var pointers:                19,2741 (insens) / 19,2741 (sens)
#objects:                     1,9495 (insens) / 1,9495 (sens)
#var points-to:               1995,7929 (insens) / 1995,7929 (sens)
#static field points-to:      2,1393 (sens)
#instance field points-to:    254,4160 (sens)
#array points-to:             35,7438 (sens)
#reachable methods:           2,4582 (insens) / 2,4582 (sens)
#call graph edges:            18,9943 (insens) / 18,9943 (sens)
----------------------------------------

before patch + revert ZLA optimazation

-------------- Pointer analysis statistics: --------------
#var pointers:                19,2830 (insens) / 19,2830 (sens)
#objects:                     1,9688 (insens) / 1,9688 (sens)
#var points-to:               2008,2110 (insens) / 2008,2110 (sens)
#static field points-to:      2,1412 (sens)
#instance field points-to:    255,7181 (sens)
#array points-to:             36,9555 (sens)
#reachable methods:           2,4598 (insens) / 2,4598 (sens)
#call graph edges:            19,0042 (insens) / 19,0042 (sens)
----------------------------------------

patch

-------------- Pointer analysis statistics: --------------
#var pointers:                19,2807 (insens) / 19,2807 (sens)
#objects:                     1,9543 (insens) / 1,9543 (sens)
#var points-to:               2009,2310 (insens) / 2009,2310 (sens)
#static field points-to:      2,1397 (sens)
#instance field points-to:    256,1075 (sens)
#array points-to:             36,0452 (sens)
#reachable methods:           2,4594 (insens) / 2,4594 (sens)
#call graph edges:            19,0032 (insens) / 19,0032 (sens)
----------------------------------------

patch + revert ZLA optimazation

-------------- Pointer analysis statistics: --------------
#var pointers:                19,2838 (insens) / 19,2838 (sens)
#objects:                     1,9692 (insens) / 1,9692 (sens)
#var points-to:               2009,7506 (insens) / 2009,7506 (sens)
#static field points-to:      2,1413 (sens)
#instance field points-to:    255,9100 (sens)
#array points-to:             36,9637 (sens)
#reachable methods:           2,4599 (insens) / 2,4599 (sens)
#call graph edges:            19,0071 (insens) / 19,0071 (sens)
----------------------------------------
patch - AnalysisModel.copyOf.MockObj + revert ZLA optimazation
-------------- Pointer analysis statistics: --------------
#var pointers:                19,2838 (insens) / 19,2838 (sens)
#objects:                     1,9692 (insens) / 1,9692 (sens)
#var points-to:               2009,7506 (insens) / 2009,7506 (sens)
#static field points-to:      2,1413 (sens)
#instance field points-to:    255,9100 (sens)
#array points-to:             36,9637 (sens)
#reachable methods:           2,4599 (insens) / 2,4599 (sens)
#call graph edges:            19,0071 (insens) / 19,0071 (sens)
----------------------------------------

patch - AnalysisModel.copyOf + revert ZLA optimazation
-------------- Pointer analysis statistics: --------------
#var pointers:                19,2830 (insens) / 19,2830 (sens)
#objects:                     1,9688 (insens) / 1,9688 (sens)
#var points-to:               2008,2110 (insens) / 2008,2110 (sens)
#static field points-to:      2,1412 (sens)
#instance field points-to:    255,7181 (sens)
#array points-to:             36,9555 (sens)
#reachable methods:           2,4598 (insens) / 2,4598 (sens)
#call graph edges:            19,0042 (insens) / 19,0042 (sens)
----------------------------------------

CSObj csNewArray = csManager.getCSObj(context, newArray);
solver.addVarPointsTo(context, result, csNewArray);
} else {
solver.addVarPointsTo(context, result, csObj);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regression is caused by missing handlers.keySet().forEach(solver::addIgnoredMethod) in onStart()

@jjppp jjppp changed the title Improve Arrays.copyOf handling and add corresponding testcase Model Arrays.copyOf for non-functional arrays to obtain sound results for taint analysis Sep 12, 2025
@jjppp jjppp changed the title Model Arrays.copyOf for non-functional arrays to obtain sound results for taint analysis Model Arrays.copyOf for non-functional arrays for soundness Sep 12, 2025
@jjppp jjppp merged commit 89a4ea0 into pascal-lab:master Sep 12, 2025
5 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Sep 12, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants