Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove encoded characters from file name #781

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

dionisislolas
Copy link
Contributor

What

Remove encoded characters from file name
Fix for the issue with the visualisations zip files that was containing files with spaces in the name

How to review

Changes are ok and test pass.

Run the static-files-with-auth stack.
Create a collection with a dataset that is going to the lockbox
Add to the same collection also a visualisation that contains a file with spaces in the visualisations zip file
Collection should get approved and published successfully

Who can review

Anyone

@dionisislolas dionisislolas requested a review from a team as a code owner November 15, 2024 11:56
@@ -363,11 +366,12 @@ protected void uploadWhitelistedFiles(Collection collection, CollectionReader co
for (String uri : collectionReader.getReviewed().listUris()) {
if (uri.endsWith(".csv") || uri.endsWith(".xlsx") || uri.endsWith(".xls") || uri.endsWith(".csdb")) {
String fileName = uri.substring(1);
Resource myFile = collectionReader.getResource(fileName);
String decodedFilename = removeEncoding(fileName);
Resource myFile = collectionReader.getResource(decodedFilename);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can "myFile" be renamed to something like "decodedFile" please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now addressed.

@@ -541,4 +545,38 @@ protected String findCorrectDatasetVersion(List<String> listOfUris) {
}
return datasetVersion;
}

protected String removeEncoding(String filename) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could "filename" be changed to lower camel case please e.g. "fileName"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now addressed.

@franmoore05
Copy link
Contributor

franmoore05 commented Nov 17, 2024

I think there are several things we could change in code to improve how lockbox 2.0 works. We are still sending all the files in the collection to be checked and processed which we don't need to do, and this is what is introducing the errors. Some suggestions to change the code are as follows:

Within our if statement to check if the upload new endpoint is enabled (https://github.com/ONSdigital/zebedee/blob/develop/zebedee-cms/src/main/java/com/github/onsdigital/zebedee/model/approval/ApproveTask.java#L167), we could do something like the following:

List<String> reviewedList = collection.getReviewed().uris().stream().filter(i -> i.contains("datasets")).collect(Collectors.toList());
       
String whitelist = getDatasetWhitelist();
List<String> allowed = Arrays.asList(whitelist.split(","));

reviewedList = reviewedList.stream().filter(j -> allowed.stream().anyMatch(j::contains)).collect(Collectors.toList()); 

This would allow us to send a reduced list through to the uploadNewEndpoint if we change the parameters to be a List rather than needing to send the whole collection. Also, the uploadNewEndpoint is not required as it doesn't do anything other than forward on to another function, so we could send that list straight to uploadWhitelistedFiles which would then have the signature uploadWhitelistedFiles(CollectionReader collectionReader, List urisToCheck, String description).

The description string parameter has also been added in place of using the collection to get the description to add to the file path. Sending through only the files required means we no longer need to handle any special characters or spaces within the filenames as we should be doing as little with those as possible.

Also, if we move the findCorrectDatasetVersion function call within this if statement https://github.com/ONSdigital/zebedee/blob/develop/zebedee-cms/src/main/java/com/github/onsdigital/zebedee/model/approval/ApproveTask.java#L367 and pass in the file name then it will resolve the bug with the incorrect version number of the non-timeseries datasets. findCorrectDatasetVersion will then have to be updated to include an additional check like this if (uri.contains("previous") && uri.contains(fileName)) to ensure the correct version is returned for the specific file, but it seems to work ok.

String whitelist = getDatasetWhitelist();
List<String> allowed = Arrays.asList(whitelist.split(","));
reviewedList = reviewedList.stream().filter(j -> allowed.stream().anyMatch(j::contains)).collect(Collectors.toList());
uploadWhitelistedFiles(collectionReader, reviewedList, collection.getDescription().getId());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will probably need a null check on the reviewedList to ensure no null pointer exceptions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants