Skip to content

Conversation

@johanneswilm
Copy link

Remove slugify.

This PR builds on top of #180 and removes the slugification of answers to multiselect questions. Without this applied, it's not possible to dumpdata/loaddata of pre-existing user data.

initial = []
if answer.body == "[]":
pass
elif "[" in answer.body and "]" in answer.body:
Copy link
Author

@johanneswilm johanneswilm Mar 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Pierre-Sassoulas I don't fully understand what the idea was here. It looks to me like a "]" and a "[" anywhere in the answer.body would cause this to be initiated for example ("Subtitled [English]"). Maybe that's why you decided to slugify? To make sure there wouldn't be any "[" nor "]" in there?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, slugifying means that unique choices can be come none unique. If choices are something like "<40" and ">40" then both are translated into "40"

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, I remember this was a bit hacky, we're serializing / deserializing a list of answers here.

unformated_choices = answer.body[1:-1].strip()
for unformated_choice in unformated_choices.split(settings.CHOICES_SEPARATOR):
choice = unformated_choice.split("'")[1]
initial.append(slugify(choice))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I was wondering why the contents of answer.body are not just JSON or some other standardized method. That way one doesn't run risks of this mechanism breaking as one can fall back to existing json deserialization methods.

It's not quite json as of now because it's using single quotation marks rather than double quotation marks. But it's already quite close.

My code below doesn't change that but I think it's slightly more robust in some cases.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a small sql database, so I suppose I wanted to avoid dumping a json.

Copy link
Owner

@Pierre-Sassoulas Pierre-Sassoulas Mar 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A standardized method (possibily json) would indeed be better than this custom parsing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Pierre-Sassoulas Ok, but it wouldn't take up any more space in the database. Instead of

['one', 'two', 'three']

in JSON that would simple be:

["one", "two", "three"]

An advantage would be that it would automatically be able to read things like this as well:

["'one'", "t,w,o", "[three]"]

I think my code below can do most of that as well though. So maybe it's not needed.

I spent a day last week cleaning up a ~90k row dump turning everything back from the slugified version to the original options because loaddata wouldn't accept the data before realizing that I needed to override this code as well to actually make the original data work. That's the reason I raise this.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like it's going to be hard to make a migration from slugified to the new value (as there is less information in slugified answer than the actual answers) so putting that in a breaking change for version 2.0.0 and removing all the migrations files seems in order, what do you think ?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe create a "best effort" migration file ?

Copy link
Author

@johanneswilm johanneswilm Mar 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to make a data migration file in which one takes all the options, converts them using slugify and that way make a pretty good guess as to which of the original options is the right one. Two cases where I don't think that's possible:

  • If the options have changed after some respondents answered. The slugified version will either correspond to none of the current choices or may even correspond to an incorrect one.
  • Cases where two or more options turn into the same slugified value.

So yeah, probably major version change and some way to gracefully handle the two above situations telling users that "sorry, some data was lost and we can no longer figure out what your respondents actually answered".

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good !

@Pierre-Sassoulas Pierre-Sassoulas modified the milestones: 1.4.7, 1.4.8 Feb 23, 2024
@Pierre-Sassoulas Pierre-Sassoulas removed this from the 1.4.8 milestone Oct 31, 2024
@jonafato
Copy link
Collaborator

@Pierre-Sassoulas @johanneswilm I know this PR is a couple years old now, but I wanted to see if there was anything I could help with to get this resolved and planned for release. From the discussion here, it sounds like there are a couple of factors to consider:

  1. It sounds like this is already being considered a major version change, so perhaps we can set it for the 2.0.0 milestone and offer pre-release versions for testing purposes.
  2. Some best-effort migration process would be helpful, though it may not always be possible to cleanly migrate old data to the new format, and developers using this migration should be informed about its limitations. Would a (temporary) management command offer more flexibility than a migration file for this purpose? e.g. this might allow the process to be more interactive and / or signal cases where the migration script knows that it cannot be certain of the migration and allow someone to decide whether and how to proceed. (I suppose these cases are somewhat similar to resolving git merge conflicts, where a person may be able to make reasonable decisions that cannot be safely made automatically.)

@Pierre-Sassoulas
Copy link
Owner

Releasing a major version is not an issue, feel free to do It.b It's probably possible to match with high certainty what the unsluggified answer is with difflib.get_close_match and worse case a user input to choose so the migration is 'perfect'. But maybe it's more reasonable to make a best effort migration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants