Skip to content

Conversation

hsato03
Copy link
Collaborator

@hsato03 hsato03 commented Oct 9, 2025

Description

Currently, when attempting to remove a primary storage with a detached ready volume, a NPE is thrown. Therefore, instead of informing users of the undestroyed volumes in the storage pool through the logs, ACS is throwing a generic exception without much information.

This PR intends to fix this bug.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

  1. I created a VM and attached a DATADISK volume to it, which was created in the pri-nfs storage pool.
  2. I detached the volume and enabled the pri-nfs maintenance mode.
  3. After the storage entered maintenance mode, I tried to remove it and observed the NPE.
2025-10-09 11:42:08,608 ERROR [c.c.a.ApiServer] (qtp1070044969-23:[ctx-fecf6a51, ctx-de782f4a]) (logid:5d87fe7b) unhandled exception executing api command: [Ljava.lang.String;@38e7b06f java.lang.NullPointerException: Cannot invoke "com.cloud.vm.VMInstanceVO.getUuid()" because "volInstance" is null
        at com.cloud.storage.StorageManagerImpl.getStoragePoolNonDestroyedVolumesLog(StorageManagerImpl.java:1802)
        at com.cloud.storage.StorageManagerImpl.lambda$deleteDataStoreInternal$2(StorageManagerImpl.java:1769)

With the patch, I checked that the NPE no longer occurs.

2025-10-09 11:54:13,768 DEBUG [c.c.s.StorageManagerImpl] (qtp1390913202-28:[ctx-b34c3b23, ctx-8fc056a9]) (logid:23462342) Cannot delete storage pool StoragePool {"id":9,"name":"pri-nfs","poolType":"NetworkFilesystem","uuid":"be7199ad-50ac-329d-81a1-8315ae3149c2"} as the following non-destroyed volumes are on it: [Volume [637e6255-1a74-498f-9a63-36a365742d49] (attached to VM [233da21f-8e28-492c-b5b2-12cae4b933da]), Volume [970f9b5f-7af3-4f91-9eab-ac1e62c1270d] (attached to VM [3774b718-8c06-492c-ac1d-e7f273bc6616]), Volume [3f41346d-f78c-46d7-adac-4664b0970ac3] (attached to VM [bec81d39-696b-437c-a4d0-6bccc7eaab1d]), Volume [615e135e-166e-4953-ae07-c8fc4280a2ad]].
2025-10-09 11:54:13,769 ERROR [c.c.a.ApiServer] (qtp1390913202-28:[ctx-b34c3b23, ctx-8fc056a9]) (logid:23462342) unhandled exception executing api command: [Ljava.lang.String;@29300213 com.cloud.utils.exception.CloudRuntimeException: Cannot delete pool StoragePool {"id":9,"name":"pri-nfs","poolType":"NetworkFilesystem","uuid":"be7199ad-50ac-329d-81a1-8315ae3149c2"} as there are non-destroyed volumes associated to this pool.
	at com.cloud.storage.StorageManagerImpl.deleteDataStoreInternal(StorageManagerImpl.java:1770)
	at com.cloud.storage.StorageManagerImpl.deletePool(StorageManagerImpl.java:1693)

How did you try to break this feature and the system with this change?

@hsato03
Copy link
Collaborator Author

hsato03 commented Oct 9, 2025

@blueorangutan package

@blueorangutan
Copy link

@hsato03 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

Copy link

codecov bot commented Oct 9, 2025

Codecov Report

❌ Patch coverage is 50.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 17.56%. Comparing base (e6c7a71) to head (8cf1034).
⚠️ Report is 70 commits behind head on main.

Files with missing lines Patch % Lines
...ain/java/com/cloud/storage/StorageManagerImpl.java 50.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #11817      +/-   ##
============================================
+ Coverage     17.39%   17.56%   +0.16%     
- Complexity    15283    15498     +215     
============================================
  Files          5889     5898       +9     
  Lines        526184   527780    +1596     
  Branches      64242    64474     +232     
============================================
+ Hits          91542    92705    +1163     
- Misses       424298   424652     +354     
- Partials      10344    10423      +79     
Flag Coverage Δ
uitests 3.59% <ø> (-0.03%) ⬇️
unittests 18.63% <50.00%> (+0.18%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 15366

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-14619)
Environment: kvm-ol8 (x2), zone: Advanced Networking with Mgmt server ol8
Total time taken: 60889 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr11817-t14619-kvm-ol8.zip
Smoke tests completed. 147 look OK, 2 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_02_enableHumanReadableLogs Error 0.24 test_human_readable_logs.py
test_03_create_redundant_VPC_1tier_2VMs_2IPs_2PF_ACL_reboot_routers Failure 527.69 test_vpc_redundant.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants