-
Notifications
You must be signed in to change notification settings - Fork 196
Add asset_manager_id to file attachments #9641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
app/models/file_attachment.rb
Outdated
def attachment_data_id | ||
return unless csv? && attachable.is_a?(Edition) && attachment_data.all_asset_variants_uploaded? | ||
|
||
attachment_data.id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure attachment_data.id
is the ID we want. GdsApi.asset_manager
expects the asset's ID, and that one, in whitehall, is Asset.asset_manager_id
.
I think this is the reason why we have this scenario with preview_url
formed in whitehall using attachment_data.id
and then overwritten in government-frontend
with the URL that has this asset_manager_id
in it (formed here I think, but I might be wrong).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right. It should be asset_manager_id of the relevant asset (an object of the Asset
class). The issue is that there can be more than one asset per attachment (an object of the Attachment
class). (The path to assets is attachment.attachment_data.assets
) But the solution is to get the asset with the variant "original"
, which means the original file as uploaded. The combination of assetable_id
, assetable_type
and variant
is unique (there is a validation for that). assetable_type
in this case is "AttachmentData"
, assetable_id
is the id of the object of the AttachmentData
class (the id from the database). And the variant should be specified as "original"
. In this way we can get the unique asset_manager_id
.
There is a method called get_asset
in AssetManagerStorage
but it's currently private. But we can make it public. asset_manager_id
can be retrieved this way: attachment.attachment_data.file.file.send(:get_asset).asset_manager_id
Of course in the final solution we should create a couple more methods, so there are no deep dependencies present in the final code.
a04c520
to
cfe3742
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good, I only have one small suggestion.
And really nice that you covered the changes with tests ⭐
19b2eaf
to
22f921d
Compare
22f921d
to
095a005
Compare
9618129
to
6fa9a48
Compare
AttachmentData has |
@minhngocd Ohh interesting, it make sense. I don't know why we haven't thought of that. |
Another thought i was also having is: what's the need for the asset manager ID? If you're getting it from asset manager via the GDS API, would that not be equivalent to using the attachment URL that comes through with the content item? e.g. https://www.integration.publishing.service.gov.uk/api/content/government/publications/pension-credit-toolkit
The url is a direct link to the asset on asset manager. can that not be used to render previews? Sorry if I'm not getting something - happy to jump on a call and discuss |
This is the current behaviour. Government-frontend uses this asset URL for creating the |
ahhh ok thanks for explaining! yes, in which case i think exposing it from AttachmentData > Assets through to the |
Suggestion, when we expose it, maybe it can be an array of hash object?
Because the relationship between AttachmentData and Assets is one to many due to the multiple versions of image uploads. If we do an array of objects, we can return everything rather than arbitrarily selecting the first asset to send. |
We will be sending this asset_manager_id only for csv file attachments. I might be wrong, but I guess the AttachmentData will have only one Asset linked to it, right? So, in this case, picking the first Asset is actually the only one. |
6fa9a48
to
cf84b7a
Compare
cf84b7a
to
29b7840
Compare
Ups, I requested a review from the wrong user |
app/models/file_attachment.rb
Outdated
def asset_manager_id | ||
return unless csv? && attachable.is_a?(Edition) && attachment_data.all_asset_variants_uploaded? | ||
|
||
attachment_data.assets.first.asset_manager_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I feel like implementing this specifically for the CSV scenario and only returning the first asset manager id feels too narrow to one use-case. This is then also not explicit in the naming as well since the variable being exposed is asset_manager_id
I would expect it to return all asset manager IDs associated with the file, not just the first CSV ones.
can we not return something like
assets: [{
asset_manager_id: abc
filename: abc_1.csv
}]
And then it's up to the consumer of the content item to decide to use the first one if it's a CSV based off the content_type of the attachment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise it feels like we're building consumer logic into the provider
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree the name is not very explicit for what that value is used.
The initial idea was to send this asset_manager_id (or a better name) only for csv attachments. In this way, government-frontend knows when to add the preview URL to the page.
It would be nice not to have any logic in govenrment-frontend for checking if the file is a csv or not and to have this logic in the publishing side since this check is already there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, will it be useful for other types of assets to have this information sent? I'm mindful there are quite a lot of assets and sending this information for all of them is not very useful since we need it only for csv files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline with @catalinailie , As an FYI update for folks reading the PR. We decided we would require returning all assets array as the file is not necessarily always the first one.
Asset.where(assetable_type: "AttachmentData").pluck(:assetable_id).group_by{|e|e}.select{|k,v|v.size>1}.count
=> 782044
There are 780k+ attachment data records where there are multiple variants of asset on an AttachmentData (still roughly 400k, discounting the duplicates)
example of a live published edition:
[#<Asset:0x0000ffff710ae9e0
id: 3812782,
asset_manager_id: "67598fff9f669f2e28ce2b01",
assetable_type: "AttachmentData",
assetable_id: 1286037,
filename: "thumbnail_redacted..">,
#<Asset:0x0000ffff710ae8a0
id: 3812783,
asset_manager_id: "67598fff3bb681ccb0d346b7",
assetable_type: "AttachmentData",
assetable_id: 1286037,
filename: "redacted...">]
Which belongs to a published edition, still live - and the first entry seems to be the thumbnail
22b7fe2
to
ea4f0a9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks @catalinailie !
I guess the failing tests are related to publishing API changes to the schema going through.
Just to note that this will be useful to data analysts, who have been asked by other departments to provide lists of attachments, when they were last updated, and what page they are attached to. Getting this information will require a join between the Publishing API and the Asset Manager databases, for all types of attachment (pdf, csv, etc.). |
This needs to follow on from alphagov/publishing-api#2994 As described there, the lack of asset_manager_id in the data for file asset attachments in content store makes it hard to request the attachments on the server side. Instead, we have to redirect users to the assets, which has led to ugly workarounds like CSV previews being rendered by frontend but served on assets.publishing.service.gov.uk. Adding asset_manager_id should allow the frontends to request attachments directly, using the API client: GdsApi.asset_manager.media(asset_manager_id, filename) This should make it much more convenient to preview assets.
When sending attachment information to publishing api, use asset_manager_id instead of attachment_data_id, as this is the id that the asset manager expects.
Initially we were sending only one asset_manager_id, but a file attachment can have multiple assets linked to it. Sending details for all assets through publishing-api means these values might be useful in the future for other functionalities.
ea4f0a9
to
f71ae81
Compare
This needs to follow on from alphagov/publishing-api#2994
As described there, the lack of asset_manager_id in the data for file asset attachments in content store makes it hard to request the attachments on the server side. Instead, we have to redirect users to the assets, which has led to ugly workarounds like CSV previews being rendered by frontend but served on assets.publishing.service.gov.uk.
Adding asset_manager_id should allow the frontends to request attachments directly, using the API client:
This should make it much more convenient to preview assets, or use them for other rendering purposes (e.g. showing a CSV as a line graph).
EDIT: This description has been updated by @unoduetre to not include details which are no longer relevant.