Add support for handling system profiles #1388

rtamalin · 2025-11-11T22:16:56Z

Define two new tables:

profiles This table stores the complete profiles that have been provided by clients as part of the system registration (announce_system) or keepalive (update) request handling, which consists of three fields, product_type, identifier and data.
system_profiles This table is used to link systems records to their associated profiles in a one to many relationship, allowing for the same profile records to be shared between multiple systems

Add support for handling any provided profiles to the handlers for the announce_system and update requests in the connect V3 API.

This support involves checking that the provided profiles are valid and, when provided for the first time, complete, with new profiles being added to the profiles table, and appropriate references being added or updated via the system_profiles table to associate systems with their corresponding profiles.

Similarly, when incomplete profiles are included in update requests, the corresponding complete profiles will be retrieved and used, with any missing profiles being dropped and considered problematic.

Additionally, when problematic profiles are detected, whether invalid or incomplete, they will be ignored and the X-System-Profiles-Action header will be included in the response, with a value of clear-cache.

In the System model a new custom attribute, complete_profiles, with an accompanying assignment method, complete_profiles=, is used to handle the assignment of complete profiles to be associated with a system record as part of record creation or update.

If racing requests attempt to create the same new profile, one will succeed and the others will rescue the ActiveRecord:RecordNotUnique exception and instead lookup the newly created profile record.

Add support for optimizing the content of send_bulk_system_update() request to only include the full profile, including both identifier and data fields, on the first occurrence of a profile within the list of serialized systems, with subsequent profile occurrences dropping the data field.

Update the test cases to validate correct operation of new models, and associated request handling changes.

Related Jira: TEL-265

Define two new tables: * profiles This table stores the complete profiles that have been provided by clients as part of the system registration (announce_system) or keepalive (update) request handling, which consists of three fields, product_type, identifier and data. * system_profiles This table is used to link systems records to their associated profiles in a one to many relationship, allowing for the same profile records to be shared between multiple systems Add support for handling any provided profiles to the handlers for the announce_system and update requests in the connect V3 API. This support involves checking that the provided profiles are valid and, when provided for the first time, complete, with new profiles being added to the profiles table, and appropriate references being added or updated via the system_profiles table to associate systems with their corresponding profiles. Similarly, when incomplete profiles are included in update requests, the corresponding complete profiles will be retrieved and used, with any missing profiles being dropped and considered problematic. Additionally, when problematic profiles are detected, whether invalid or incomplete, they will be ignored and the X-System-Profiles-Action header will be included in the response, with a value of clear-cache. In the System model a new custom attribute, complete_profiles, with an accompanying assignment method, complete_profiles=, is used to handle the assignment of complete profiles to be associated with a system record as part of record creation or update. If racing requests attempt to create the same new profile, one will succeed and the others will rescue the ActiveRecord:RecordNotUnique exception and instead lookup the newly created profile record. Update the test cases to validate correct operation of new models, and associated request handling changes.

rtamalin · 2025-11-12T19:10:15Z

FYI while doing some heavy stress testing I'm seeing some DB stall issues

rtamalin · 2025-11-12T21:16:51Z

Investigating and the issue may be related to recent changes on the master branch

rtamalin · 2025-11-12T21:41:00Z

Yup having reverted my test env to be based upon current master branch I still see the same issue... Digging further.

db/migrate/20251106164009_create_system_profiles.rb

rjschwei

Comments should start with uppercase to stay consistent throughout the code base

rjschwei · 2025-11-13T12:10:47Z

app/controllers/api/connect/v3/subscriptions/systems_controller.rb

+    # check if any profiles have been provided
+    if params.key?(:system_profiles)
+      profiles = info_params(:system_profiles)[:system_profiles]
+      complete, incomplete, invalid = Profile.filter_profiles(profiles.to_h)


Should we take a more fine grained approach here? At present we are looking at 2 profiles, pci data and loaded kernel module data, if one of these is not correct incomplete or invalid we have a 50/50 chance to guess as to which one from the client is provided incorrectly. When we add more data our chances to guess correctly go down. I would suggest that we loop through the system profiles at this level and then send each profile into the next level down. That way we can pick out which profile may be "broken" and can log an appropriate error. We might even go so far to relay that information to the client.

Per our design, the system_profiles entry in the request JSON payload will be a JSON object with the following structure:

... "system_profiles": { "<profile_type>": { "identifier": "<profile_identifier_string>", "data": "<profile_data_string>" }, ... }, ...

There is no guessing here - the combination of <profile_type> and identifier value is what identifies a specific profile, not just the profile identifier value itself.

This approach insulates us from any risks associated with data blobs from different profile_types ever having the same identifier value, e.g. because the hash of their content ends up being the same value; each data blob will be stored independently of each other without any risk of corrupting the data blob associated with a different profile type.

This approach also allows us to introduce new profile types in the future that may use a different identifier generating approach, if desired, without similarly worrying that it could result in overwriting the content associated with a different profile type's profile entry.

If a profile is missing the data field it is considered incomplete, and if it is missing the identifier field it is considered invalid.

For a announce_system request, as part of system registration, only complete profiles are acceptable, per our design, so we only pass on complete profiles to the create!() method, and filter out the incomplete or invalid profiles, additionally setting the X-System-Profile-Actions response header to clear-cache if any incomplete or invalid profiles are found, to indicate to the client that it should clear it's cache and send full profiles next time.

Note, though, that per our design and proposed suseconnect(-ng) implementation, the client should always be sending full profiles anyway for an announce_system request as part of a system registration.

For an update request, as part of the system keepalive notification, the expected optimization is that clients will send incomplete profiles, with only the identifier provided, so we will allow incomplete profiles in that instance, but only those that are already "known", i.e. ones that we already have a matching profile stored in the profiles table, with any "unknown" incomplete profiles being skipped, and triggering the header to be set in the response.

Well the log message just shows the count, so there is a lot of guessing when I read "problematic profiles detected: 2 incomplete" when at some point we have more than 2 profiles. And from the naming it is not obvious that complete is of strings where the strings represent the profile_type

I can enhance the relevant debug messages to also report the profile types for each problematic category, and to expand the message content to reflect the nature of what each category is, e.g. "missing data" or "missing identifier", in addition to the code comment explanation of these.

I've enhanced the debug messages to include the profile types for the problematic categories, as well as report the profile types for valid profiles being added/updated.

app/controllers/api/connect/v3/systems/systems_controller.rb

app/models/profile.rb

rtamalin · 2025-11-13T16:18:00Z

Determined that the issue was due to a recent upgrade of containerd.io on my system to 2.1.5, which has a very low default soft file limit, leading to problems when there were lots of active connections...

Capitalize comment sentence starting words. Add extra comments to clarify how profiles are categorized, in detail in the Profile.filter_profiles() method, more briefly in the handlers for the announce_system and update requests. Rename identify_existing_profiles() to identify_known_profiles() for improved clarify of what the method is intended to do. Also tweak associated variable names to match the rename change.

Add an addition unique index spanning the system_id and profile_id fields in the system_profiles table to ensure that a given profile can only be associated with a given system once.

paragjain0910 · 2025-11-14T09:26:22Z

app/controllers/api/connect/v3/subscriptions/systems_controller.rb

+        logger.debug("problematic profiles detected: #{incomplete.count} incomplete, #{invalid.count} invalid")
+        response.headers['X-System-Profiles-Action'] = 'clear-cache'
+      end
+


known_incomplete = Profile.identify_known_profiles(incomplete)

We should run the same identify_known_profiles check inside the announce call as well. There may be cases where a system sends two profiles with the same identifier but different profile types—one containing data and another without. In such situations, we still need to create the second profile and its corresponding system profile record.

As I said in my response to @rjschwei if two different profile types have the same identfier, that is a valid scenario, and won't cause a problem, because profiles are uniquely identifier by the combination of (profile_type, identifier) not just identifier.

And by definition an announce_system requires complete profiles, i.e. ones that have both identifier and data, because the identifier only optimization is only supported for update requests; the only reason that a client should send up an incomplete profile is when it believes that it has previously sent up the complete profile with that identifier, and, because the profile hasn't changes, it can therefore send up the optimized incomplete profile next time... But an announce_system is part of an initial system registration, and a client cannot at that point make the assumption that it has sent up anything previously; it should always send up complete profiles.

The only likely time, in our current model of operation, that two different profile_types can validly have the same identifier value and same data content would be for the "empty" data blob. And I would prefer not to have to implement an unnecessarily complex mechanism to avoid storing an empty data field for multiple record in the profiles table. The DB storage cost for one extra record per profile type to store that profile type's version of the empty data blob is minimal vs the complexity of try to avoid storing that empty record. And under any other circumstances, if two different profile types have the same identifier, it would be an "unsafe" assumption to assume that their data blobs are in fact the same, given that the content format of the different profile type data blobs is very different, e.g. output lines from lscpi vs list of kernel modiles, vs list of packages and associated versions... If by some fluke occurrence we receive two different profile types with the same identifier for "non-empty" data blobs, we should be treating them as different data blobs.

Going one step further, if we ever decide to use a different mechanism for generating ids for new profile types in the future, then assuming that two different profile type's had the same data blob because their identifiers match would be invalid. The current model of operation support this future possibility without needing to change anything.

So to summarize, a profile is identified by the combination of it is profile_type and identifier, and we should consider each profile_type as an independent scope, and the existence of the same identifier value in multiple scopes is valid, and should not be taken to have any special meaning. This approach may result in some very minor extra DB storage usage to store independent versions of the "empty" data blob in each profile_type's scope, but that seems to me to be a minor cost vs the complexity and performance impact of the code needed to avoid this minor overhead.

Especially given that it is theoretically possible (though extremely improbably for the data blob sizes we are dealing with) for 2 different profile_type's with non-empty data blobs that are not equal to have the same identifier value.

app/models/system.rb

paragjain0910 · 2025-11-14T10:04:40Z

app/models/profile.rb

+  validates :identifier, presence: true
+  validates :data, presence: true
+
+  def self.filter_profiles(profiles)


The function currently considers profiles as complete if the identifier and data keys are present, even when the data value is empty. It might be more accurate to treat profiles with empty data as incomplete.

An empty data value is still a valid possibility, e.g. on an Azure VM we have seen the lspci output is empty, meaning that an empty data blob is a valid value to report for a pci_data profile type.

As such we consider empty data blobs as validly reportable values, and only consider a profile as incomplete if it doesn't contain the data entry.

An empty data value is still a valid possibility, e.g. on an Azure VM we have seen the lspci output is empty, meaning that an empty data blob is a valid value to report for a pci_data profile type.

How is this possible ??

My perspective:
An empty data value should ideally never occur, since identifiers are hashed from the data value.
This situation can only arise from an incorrect implementation in SuseConnect or a client-side issue (i.e., negative scenarios). In the current implementation, the impact is that empty values may be stored for incorrectly reported profiles. This should not happen. We should treat such cases as invalid or incomplete — certainly not as complete. Invalid make more sense here to me.

Hmmm, we'll let you @paragjain0910 argue that with the people at Microsoft as to why hyper-v does not expose anything to the kernel that will be listed with lspci. Maybe @olafhering has an idea why it is not possible for lspci to report data for some instances in Azure.

We can also ask the question to Microsoft directly @brett060102 do you remember which VM size had this behavior?

What Azure instance type is that? An ordinary Gen2 VM on Windows Server needs no physical or emulated PCI, the IO devices are exposed via the vmbus. A Gen1 VM will likely have a few emulated PCI devices.
In Azure a VM with accelerated networking should have the Mellanox card on the PCI bus. Newer v6 instances may use the mana driver, which may or may not be a PCI device (I do not have an instance running to verify how a mana interface is exposed).

As said, in the case of PCI Data on Azure VMs, an empty value is valid, and is just the generated hash for the associated "empty" report for the profile type, which in the case of PCI Data could be "" or for something else could be the JSON representation an empty list/array ([]) or object ({}).

@olafhering Hopefilly @brett060102 can clarify with specifics, but from what I can remember they were relatively small & minimal instance types that had no output from lspci, and for beefier instance types there was limited output for a small number of devices, but nowhere near what would be seen for comparable instance types in AWS.
We surmised that it was due to the underlying system devices being presented via a different bus type, as you have confirmed.

As said, in the case of PCI Data on Azure VMs, an empty value is valid, and is just the generated hash for the associated "empty" report for the profile type, which in the case of PCI Data could be "" or for something else could be the JSON representation an empty list/array ([]) or object ({}).

Well, yes, but @paragjain0910 was asking """How is this possible ??""". I think he deserves a little more of an answer than "We observed that lspci is empty on some Azure instances." Anyway, so that's what vmbus does, it handles the device information. Thanks @olafhering .

@paragjain0910 does that address your concern about empty PCI data and how that is possible?

app/models/profile.rb

app/models/system_profile.rb

We should consider profiles that have an empty identifier value as invalid, so update the filter_profiles() method to check for and treat them as invalid.

Update the System.complete_profiles=() method to avoid deletion and recreation of linking records for profile associations that haven't changed.

rtamalin · 2025-11-17T13:07:40Z

@ngetahun We don't plan for this to merge until after the 2.24 release goes out as well... If there is some sort of label pattern that should be used to indicate that, I'm happy to use it. Or should I just go ahead and just add a Post-v2.24 label?

Never mind I see the 2.25 label - so I added that...

Enhance the SystemSerializer to take an optional serialized_profiles set as an initialize() argument, defaulting a new empty set if not specified, and setup a serializer instance variable holding it. This serialized_profiles instance variable set tracks profile.id's and is used to determine if the serializer has previously serialized a specific profile or not, with the first serialization including the data field, and subsequent serializations dropping it. Update the send_bulk_system_update() request generation to setup a new serialized_profiles set for each batch of systems being processed ensuring that only the first occurrence of a given profile includes the data field. Update tests to exercise the new SystemSerializer initialization and optional system profiles data field inclusion, and verify that the expected profiles are serialized by send_bulk_system_update().

Improve the debug messages logged by the announce_system and update request handlers to report the profile types for problematic profiles identified. Additionally enhance the Profile.filter_profiles() method to return hashes with symbolized keys to simplify determining which incomplete profiles are unknown. Only update the profiles associated with a system if valid complete profiles were either provided or identified from incomplete profiles, and add a test to ensure that existing profile associations are not replaced if not valid complete profiles were provided in the update.

rtamalin · 2025-11-18T23:49:20Z

@paragjain0910 @felixsch @mssola I've updated the PR with the implementation of the optimized serialization of systems in the send_bulk_system_update() request payload. I identified relatively minimal changes to the existing problematic test case implementation to get it passing again.

I also spotted and fixed a minor error in the update request handler that could wipe existing profile associations if profiles were provided to the update, but all were either unknown incomplete or invalid profiles. An additional test case has been added to cover this scenario.

rtamalin · 2025-11-19T22:26:58Z

Found an issue when stress testing under heavy load, have identified a promising fix that I will try out tomorrow.

rtamalin requested review from brett060102, digitaltom, felixsch, gbuenodevsuse, mssola and paragjain0910 November 11, 2025 22:16

rtamalin marked this pull request as draft November 12, 2025 19:08

paragjain0910 reviewed Nov 13, 2025

View reviewed changes

db/migrate/20251106164009_create_system_profiles.rb Show resolved Hide resolved

rjschwei reviewed Nov 13, 2025

View reviewed changes

rtamalin marked this pull request as ready for review November 13, 2025 16:18

rtamalin added 2 commits November 13, 2025 13:59

Address review feedback from Parag

3a5f2ca

Add an addition unique index spanning the system_id and profile_id fields in the system_profiles table to ensure that a given profile can only be associated with a given system once.

paragjain0910 reviewed Nov 14, 2025

View reviewed changes

app/models/system.rb Outdated Show resolved Hide resolved

paragjain0910 reviewed Nov 14, 2025

View reviewed changes

app/models/profile.rb Show resolved Hide resolved

paragjain0910 reviewed Nov 14, 2025

View reviewed changes

app/models/system_profile.rb Show resolved Hide resolved

rtamalin added 2 commits November 14, 2025 10:56

Address review feedback from Parag

16a5763

We should consider profiles that have an empty identifier value as invalid, so update the filter_profiles() method to check for and treat them as invalid.

Address review feedback from Parag

2329960

Update the System.complete_profiles=() method to avoid deletion and recreation of linking records for profile associations that haven't changed.

ngetahun added the 2 reviewers A second reviewer is requested. label Nov 17, 2025

ngetahun self-assigned this Nov 17, 2025

rtamalin added the 2.25 label Nov 17, 2025

rtamalin added 2 commits November 17, 2025 08:11

Remove commented out code

23343e7

Add changelog entry to the v2.25 release stream

ffe1af5

rtamalin added 2 commits November 18, 2025 13:24

rtamalin marked this pull request as draft November 19, 2025 22:25

Add support for handling system profiles #1388

Are you sure you want to change the base?

Add support for handling system profiles #1388

Uh oh!

Conversation

rtamalin commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rtamalin commented Nov 12, 2025

Uh oh!

rtamalin commented Nov 12, 2025

Uh oh!

rtamalin commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

rjschwei left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rtamalin commented Nov 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rtamalin commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rtamalin commented Nov 18, 2025

Uh oh!

rtamalin commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

rtamalin commented Nov 11, 2025 •

edited

Loading

rtamalin commented Nov 12, 2025 •

edited

Loading

rtamalin commented Nov 17, 2025 •

edited

Loading

rtamalin commented Nov 19, 2025 •

edited

Loading