Skip to content

AAP-17690 Inventory variables sourced from git project are not getting deleted after being removed from source #15928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: devel
Choose a base branch
from

Conversation

djulich
Copy link
Contributor

@djulich djulich commented Apr 8, 2025

SUMMARY

Fixes AAP-17690 Inventory variables sourced from git project are not getting deleted after being removed from source

The source history for each variable is now preserved, allowing for the updates from source to modify the variables as a user would expect them to behave.

Let A and B be two inventory sources for inventory INV.
A={x:1}, B={x:2} -> sync A -> INV={x:1} -> sync B -> INV={x:2} -> B={} -> sync B -> INV={x:1}

One can see that deleting variable x from source B will not delete x altogether but makes the value from the previous update from source A reappear.

You may think of the source updates as creating an overlay on a variable which covers the previous values. And by deleting a variable from this source and updating from it again will surface the value from the next layer, aka the previous update.

If a inventory source has set overwrite_vars=True, an update from this source will prune the history of all variables of this group and keep only this update as the new history.

ISSUE TYPE
  • Bug, Docs Fix or other nominal change
COMPONENT NAME
  • API
AWX VERSION
awx: 24.6.2.dev298+gf84a562647
ADDITIONAL INFORMATION

To reproduce the issue, do the following:

  1. Create a Git repo with two inventory files "src1.ini" and "src2.ini" with the following content:
    src1.ini:
[all:vars]
x=1

src2.ini

[all:vars]
x=2
  1. In awx, create a project "PRJ" with Source control type="Git", pointing to your newly created Git repo.
  2. In awx, create a test inventory "INV" and in it two inventory sources "A" and "B" with Source="Sourced from a project" and let them point to your newly created project "PRJ". Select Inventory file="src1.ini" and "src2.ini" respectively for source A and B.
  3. In awx, select inventory "INV" and click on tab "Sources".
  4. In awx, launch inventory update for source A then for source B.
  5. In awx, click on tab "Details", it should show x: 2 in the Variables field.
  6. In Git, edit src2.ini and comment out the variable assignment for x:
    src2.ini:
[all:vars]
#x=2
  1. In awx, got to "Projects" and Sync project "PRJ"
  2. In awx, select your inventory "INV", click on tab "Sources" and launch the update for B again.

Now the issue can be observed:
Since variable x is no longer defined in source B, the inventory should either revert x to the value before the update from B or remove it altogether. But it still shows x: 2!

@djulich

This comment was marked as resolved.

@djulich djulich marked this pull request as ready for review April 16, 2025 12:17
@djulich

This comment was marked as resolved.

on_delete=models.CASCADE,
)
group_name = models.CharField(max_length=256)
variables = models.JSONField() # The group variables, including their history.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to ask that someone, like @chrismeyersfsu weigh in on the choice of field type here, JSONField. There is some baggage with this, and I see json.dumps being used to save to this field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I removed the superfluous json.dumps and json.loads in front of the JSONField. Thanks for the heads-up, I totally didn't see that.

If @chrismeyersfsu has some concerns regarding the performance impact of a JSONField here, I could switch to a TextField with explicit serialization through json.dumps and json.loads.

for name, inv_var in self._vars.items():
self[name] = inv_var.value

def from_dict(self, state: dict[str, update_queue]) -> "InventoryGroupVariables":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's common to have a from_* method be a @classmethod, and I do kind of feel like that would be more clear here. Basically it would be better to always instantiate using a from_* method, avoiding ever having an object that's in a state where it can't be used yet.

Also, if I resolve the type hint, it would give

state: dict[str, list[tuple[int, TypeAlias = str | int]]]

Are you sure this is what you meant? Seems like a lot of nesting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the type, well, it is a lot of nesting indeed. But on the other hand it's the natural structure of the group variables state:

group_vars_state = {<var_name>: <var_history>, ...}
var_history = [(<src_id>, value), (<src_id>, value), ...]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's common to have a from_* method be a @classmethod, and I do kind of feel like that would be more clear here. Basically it would be better to always instantiate using a from_* method, avoiding ever having an object that's in a state where it can't be used yet.

So I probably choose the wrong name for the from_dict method. The method should not initialize the object to a usable state, instead it is a loader to move the object instance to a particular state which has to be deserialized from a dict. I understand your concerns that the method here doesn't follow the usual semantic of a from_.. method. I would propose that I change the method name to better reflect the actual purpose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed .to_dict and .from_dict to .save_state and .load_state respectively to better reflect their purpose. Together with the reasoning from my comment above would that suffice to mitigate your concerns in this conversation, @AlanCoding?

@AlanCoding
Copy link
Member

I had a quick look at the tests, and there's a clear theme that API PATCH requests are getting a 400 editing a simple inventory.

    "response": {
        "json": {
            "detail": "Cannot parse as JSON (error: the JSON object must be str, bytes or bytearray, not NoneType) or YAML (error: 'NoneType' object has no attribute 'read')."
        },
        "status_code": 400
    }

This almost certainly comes from the variables parsing within the serializer. That should be a pretty good clue as to what's going on. Even full integration tests are hitting this, so it's not something weird done directly by a test.

Copy link

codecov bot commented Apr 23, 2025

Codecov Report

Attention: Patch coverage is 96.07843% with 6 lines in your changes missing coverage. Please review.

Project coverage is 75.35%. Comparing base (3f96ea1) to head (a186fa6).
Report is 6 commits behind head on devel.

✅ All tests successful. No failed tests found.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@djulich
Copy link
Contributor Author

djulich commented Apr 24, 2025

Open topics for this PR:

  • Resolve issues from Baby Yolo run. Update tests accordingly. Maybe also look into tower-qa tests.
  • Update docs and sync with @tvo318 on the necessary changes. Also forward the changes to downstream docs.

@djulich
Copy link
Contributor Author

djulich commented Apr 24, 2025

Proposal for product documentation update is available in this comment in the Jira ticket.

@AlanCoding
Copy link
Member

Can you walk me through the intended data migration path? Why is there not a data migration? Presumably this leaves the new field blank, and the old variables still populated. Is that right?

Will those all then be treated like user-entered variables?

If "yes", then what I'm not understanding is why the serializer would handle user-entered variables as invsrc_id=-1. So then walk through 2 scenarios:

  1. User manually inputs an inventory with variables in old version, and then upgrades
  2. User upgrades and then inputs an inventory with variables

Without yet having your answer to your question here, I'm suspicious that it's a good idea for the data model to look different in these 2 scenarios.

@AlanCoding
Copy link
Member

I tested a few things:

  • deleting an inventory did cascade delete these records
  • new inventory with vars woks as expected
  • editing inventory and deleting a variable worked
  • making a new inventory and adding a variable worked
  • nested dictionaries seem to work as expected

Copy link

with open(path, "w") as fp:
fp.write("[all:vars]\n")
fp.write("b=value_b\n")
subprocess.run('git add .; git commit -m "Update variables"', cwd=repo_path, shell=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To say it out loud - it is my intent that running git commands in a subprocess is in scope for the awx/main/tests/live tests. For other test suites, something like this would be a blocker, which is why I want to say it. In this space, it's okay to do this stuff.

This was apparent as a use case from the get-go for issues like #13845

path = f"{repo_path}/inventory_var_deleted_in_source.ini"
with open(path, "w") as fp:
fp.write("[all:vars]\n")
fp.write("b=value_b\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting method. I don't mean to criticize how you solved this problem, but I do want to share how I approached the same problem earlier.

https://github.com/ansible/test-playbooks/blob/main/inventories/changes.py

By using an inventory script, it's possible to put in dynamic values for either keys or values. In your case, a randomized key will result in the "old" key disappearing in a subsequent update. This can allow doing the same full-integration test without needing to run git commands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into that!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware that I can use script-generated inventory files here. Interesting approach, but I guess we cannot preserve state between subsequent calls to such a script. The challenge would be to know what to assert in the test function when we use, e.g., random or timestamp-based variable names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the time being I would like to keep the test method simple, because it does verify the issue the PR is supposed to resolve, and I do not want to delay the merge of this PR longer than necessary.

:return: None
"""
self.name = name
self._update_queue: update_queue = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just got around to really digging into this. Here's what I don't follow - why is this [] as opposed to a {}? To understand the question, see a value:

{'foo': [[-1, 'bar']]}

The data model of this makes possible {'foo': [[-1, 'bar'], [-1, 'bar2']}, which is an invalid state, if i understand correctly.

If a variable will only ever have 1 value for 1 source, then indexing based on the source seems like the best way to structure this.

I think that maybe ordering matter? You probably thought more about this than I have - about the rules for which variable surfaces depending on ordering of actions.

@AlanCoding
Copy link
Member

I checked out the branch and had a go at writing tests with mock API requests. Either way, I would like to get some test content in along this general structure.

#15968

This is failing right now, and I want your help to sort out what's going on. Abbreviated human-readable steps:

  • Run an inventory update that pulls in foo/bar variables with "foo_source" / "bar_source" values
  • Edit through an API request to /api/v2/inventories/:id/ to set variables to {"foo": "foo_user"}
  • Expectation: I expect "foo" has the user value, and proooobably that "bar" is deleted
  • Found: I do not find the user-given value, I find "foo_source", and I can't explain why

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants