Skip to content

Add asset_manager_id to file attachments #9641

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 7, 2025
Merged

Conversation

richardTowers
Copy link
Contributor

@richardTowers richardTowers commented Nov 21, 2024

This needs to follow on from alphagov/publishing-api#2994

As described there, the lack of asset_manager_id in the data for file asset attachments in content store makes it hard to request the attachments on the server side. Instead, we have to redirect users to the assets, which has led to ugly workarounds like CSV previews being rendered by frontend but served on assets.publishing.service.gov.uk.

Adding asset_manager_id should allow the frontends to request attachments directly, using the API client:

GdsApi.asset_manager.media(asset_manager_id, filename)

This should make it much more convenient to preview assets, or use them for other rendering purposes (e.g. showing a CSV as a line graph).

EDIT: This description has been updated by @unoduetre to not include details which are no longer relevant.

⚠️ This repo is Continuously Deployed: make sure you follow the guidance ⚠️

def attachment_data_id
return unless csv? && attachable.is_a?(Edition) && attachment_data.all_asset_variants_uploaded?

attachment_data.id
Copy link
Contributor

@catalinailie catalinailie Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure attachment_data.id is the ID we want. GdsApi.asset_manager expects the asset's ID, and that one, in whitehall, is Asset.asset_manager_id.

I think this is the reason why we have this scenario with preview_url formed in whitehall using attachment_data.id and then overwritten in government-frontend with the URL that has this asset_manager_id in it (formed here I think, but I might be wrong).

Copy link
Contributor

@unoduetre unoduetre Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. It should be asset_manager_id of the relevant asset (an object of the Asset class). The issue is that there can be more than one asset per attachment (an object of the Attachment class). (The path to assets is attachment.attachment_data.assets) But the solution is to get the asset with the variant "original", which means the original file as uploaded. The combination of assetable_id, assetable_type and variant is unique (there is a validation for that). assetable_type in this case is "AttachmentData", assetable_id is the id of the object of the AttachmentData class (the id from the database). And the variant should be specified as "original". In this way we can get the unique asset_manager_id.

There is a method called get_asset in AssetManagerStorage but it's currently private. But we can make it public. asset_manager_id can be retrieved this way: attachment.attachment_data.file.file.send(:get_asset).asset_manager_id
Of course in the final solution we should create a couple more methods, so there are no deep dependencies present in the final code.

@unoduetre unoduetre changed the title Add attachment_data_id to file attachments Add asset_manager_id to file attachments Feb 24, 2025
@unoduetre unoduetre force-pushed the add-attachment-data-id branch 2 times, most recently from a04c520 to cfe3742 Compare February 24, 2025 17:17
Copy link
Contributor

@catalinailie catalinailie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good, I only have one small suggestion.
And really nice that you covered the changes with tests ⭐

@unoduetre unoduetre force-pushed the add-attachment-data-id branch 3 times, most recently from 19b2eaf to 22f921d Compare March 6, 2025 11:17
@catalinailie catalinailie force-pushed the add-attachment-data-id branch from 22f921d to 095a005 Compare March 19, 2025 08:31
@catalinailie catalinailie force-pushed the add-attachment-data-id branch 2 times, most recently from 9618129 to 6fa9a48 Compare April 3, 2025 18:05
@minhngocd
Copy link
Contributor

minhngocd commented Apr 7, 2025

AttachmentData has assets (usually only one now, for file attachments), which also store asset_manager_id. Can we not make use of that instead of taking it from file which I understand is more the actual file uploader from CarrierWave?
Are we looking to attach this ID to the content item when publishing it through to content store? if so, I think AttachmentData > Asset > Asset Manager ID might be more straight forward and actually persisted in the database

@catalinailie
Copy link
Contributor

@minhngocd Ohh interesting, it make sense. I don't know why we haven't thought of that.
Yes, we are looking to attach this ID to the content item when publishing.

@minhngocd
Copy link
Contributor

Another thought i was also having is: what's the need for the asset manager ID? If you're getting it from asset manager via the GDS API, would that not be equivalent to using the attachment URL that comes through with the content item?

e.g. https://www.integration.publishing.service.gov.uk/api/content/government/publications/pension-credit-toolkit
File attachments currently come through with the content item as

{
        "accessible": false,
        "alternative_format_contact_email": "[email protected]",
        "attachment_type": "file",
        "command_paper_number": "",
        "content_type": "application/zip",
        "file_size": 465016,
        "filename": "pension-credit-customers-english.zip",
        "hoc_paper_number": "",
        "id": "8524176",
        "isbn": "",
        "locale": "en",
        "title": "Pension Credit social media images directed at potential customers",
        "unique_reference": "",
        "unnumbered_command_paper": false,
        "unnumbered_hoc_paper": false,
        "url": "https://assets.publishing.service.gov.uk/media/6756cb787ec19a3516f79a0e/pension-credit-customers-english.zip"
      },

The url is a direct link to the asset on asset manager. can that not be used to render previews? Sorry if I'm not getting something - happy to jump on a call and discuss

@catalinailie
Copy link
Contributor

The url is a direct link to the asset on asset manager. can that not be used to render previews?

This is the current behaviour. Government-frontend uses this asset URL for creating the /media/id/filename/preview url.
But for our change, we need to extract the asset_manager_id and filename to create the new preview /csv-preview/id/filename url that will be served from gov.uk domain. And parsing an URL to get an ID isn't the best approach. It will be nicer if that logic moves away from government-frontend and we just use the asset_manager_id from content item.

@minhngocd
Copy link
Contributor

ahhh ok thanks for explaining! yes, in which case i think exposing it from AttachmentData > Assets through to the publishing_api_details on Attachment would be the most straightforward

@minhngocd
Copy link
Contributor

Suggestion, when we expose it, maybe it can be an array of hash object?
like

assets: [{
  filename: 
  asset_manager_id:
}]

Because the relationship between AttachmentData and Assets is one to many due to the multiple versions of image uploads. If we do an array of objects, we can return everything rather than arbitrarily selecting the first asset to send.

@catalinailie
Copy link
Contributor

We will be sending this asset_manager_id only for csv file attachments. I might be wrong, but I guess the AttachmentData will have only one Asset linked to it, right? So, in this case, picking the first Asset is actually the only one.

@catalinailie catalinailie force-pushed the add-attachment-data-id branch from 6fa9a48 to cf84b7a Compare April 8, 2025 10:36
@catalinailie catalinailie force-pushed the add-attachment-data-id branch from cf84b7a to 29b7840 Compare April 22, 2025 13:49
@catalinailie catalinailie requested a review from unoduetre April 22, 2025 15:30
@catalinailie
Copy link
Contributor

Ups, I requested a review from the wrong user

@catalinailie catalinailie requested review from minhngocd and removed request for unoduetre April 22, 2025 15:35
def asset_manager_id
return unless csv? && attachable.is_a?(Edition) && attachment_data.all_asset_variants_uploaded?

attachment_data.assets.first.asset_manager_id
Copy link
Contributor

@minhngocd minhngocd Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I feel like implementing this specifically for the CSV scenario and only returning the first asset manager id feels too narrow to one use-case. This is then also not explicit in the naming as well since the variable being exposed is asset_manager_id I would expect it to return all asset manager IDs associated with the file, not just the first CSV ones.

can we not return something like

assets: [{
  asset_manager_id: abc
  filename: abc_1.csv
}]

And then it's up to the consumer of the content item to decide to use the first one if it's a CSV based off the content_type of the attachment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise it feels like we're building consumer logic into the provider

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree the name is not very explicit for what that value is used.
The initial idea was to send this asset_manager_id (or a better name) only for csv attachments. In this way, government-frontend knows when to add the preview URL to the page.
It would be nice not to have any logic in govenrment-frontend for checking if the file is a csv or not and to have this logic in the publishing side since this check is already there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, will it be useful for other types of assets to have this information sent? I'm mindful there are quite a lot of assets and sending this information for all of them is not very useful since we need it only for csv files.

Copy link
Contributor

@minhngocd minhngocd May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline with @catalinailie , As an FYI update for folks reading the PR. We decided we would require returning all assets array as the file is not necessarily always the first one.

Asset.where(assetable_type: "AttachmentData").pluck(:assetable_id).group_by{|e|e}.select{|k,v|v.size>1}.count
=> 782044

There are 780k+ attachment data records where there are multiple variants of asset on an AttachmentData (still roughly 400k, discounting the duplicates)

example of a live published edition:

[#<Asset:0x0000ffff710ae9e0
  id: 3812782,
  asset_manager_id: "67598fff9f669f2e28ce2b01",
  assetable_type: "AttachmentData",
  assetable_id: 1286037,
  filename: "thumbnail_redacted..">,
 #<Asset:0x0000ffff710ae8a0
  id: 3812783,
  asset_manager_id: "67598fff3bb681ccb0d346b7",
  assetable_type: "AttachmentData",
  assetable_id: 1286037,
  filename: "redacted...">]

Which belongs to a published edition, still live - and the first entry seems to be the thumbnail

@catalinailie catalinailie requested a review from minhngocd May 2, 2025 08:39
@catalinailie catalinailie force-pushed the add-attachment-data-id branch 2 times, most recently from 22b7fe2 to ea4f0a9 Compare May 2, 2025 14:14
Copy link
Contributor

@minhngocd minhngocd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks @catalinailie !
I guess the failing tests are related to publishing API changes to the schema going through.

@nacnudus
Copy link

nacnudus commented May 6, 2025

Just to note that this will be useful to data analysts, who have been asked by other departments to provide lists of attachments, when they were last updated, and what page they are attached to. Getting this information will require a join between the Publishing API and the Asset Manager databases, for all types of attachment (pdf, csv, etc.).

richardTowers and others added 3 commits May 7, 2025 08:47
This needs to follow on from alphagov/publishing-api#2994

As described there, the lack of asset_manager_id in the data for file asset attachments in content store makes it hard to request the attachments on the server side. Instead, we have to redirect users to the assets, which has led to ugly workarounds like CSV previews being rendered by frontend but served on assets.publishing.service.gov.uk.

Adding asset_manager_id should allow the frontends to request attachments directly, using the API client:

    GdsApi.asset_manager.media(asset_manager_id, filename)

This should make it much more convenient to preview assets.
When sending attachment information to publishing api,
use asset_manager_id instead of attachment_data_id,
as this is the id that the asset manager expects.
Initially we were sending only one asset_manager_id,
but a file attachment can have multiple assets linked to it.
Sending details for all assets through publishing-api means these
values might be useful in the future for other functionalities.
@catalinailie catalinailie force-pushed the add-attachment-data-id branch from ea4f0a9 to f71ae81 Compare May 7, 2025 07:47
@catalinailie catalinailie marked this pull request as ready for review May 7, 2025 07:55
@catalinailie catalinailie merged commit 0b1a531 into main May 7, 2025
22 checks passed
@catalinailie catalinailie deleted the add-attachment-data-id branch May 7, 2025 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants