-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove fullstop automatically when it finds the same description #10
Comments
But... how useful is to find and delete the exact description? It might be more effective if the system adds to the csv the last word with the full stop. Keeping this in mind, this action could be inserted in the first option ("Remove full stop"), avoiding to create another option. With the actual way the script would make the next steps:
In this way, the script won't delete the description if something differs from the one added. But, if we save the last word and not all the description, the script would make:
With this we save time because:
Thus, over time it would work with less and less human intervention. Of course, another things to keep in mind:
@davidabian, I know all this issue is very long, specifically this last comment, but I would like to know your opinion about the reformulation of the system to save time (and make CanaryBot more "intelligent" 🐦 ). Of course, thanks in advance! |
This is more of a linguistic issue and I can't expect what the results will be in the languages I don't know deeply; I guess in some languages this will cause too many false positives to be considered, while in Spanish or English this can work (but only if the bot is operated carefully, since a single mistake by the bot operator could be spread to several unrelated entities). |
@davidabian, at this time all the full stops need to be confirmed to be removed from the description. The criterion to remove it is to be sure that it isn't an abbreviation or something that needs to have the full stop. In addition to that exists the exceptions lists, with it the script avoid the descriptions that match the pattern of any exception. What kind of human mistakes do you refer? Mark a full stop to remove when it is part of an abbreviation because then the script would remove automatically? If this is the kind of mistakes do you mean, of course, the operator needs to be sure of what is doing. In the case of the languages I follow the same rule than in the other: Is or isn't the full stop necessary? Of course, if the operator has a doubt in any of the language in which the script work, the operator has the option "Add description to checklist". Then, the operator might review the checklist to ask in Wikidata what could be the good option to choose, or in the case it is part of an abbreviation, create another regex in the exception list. |
This idea is to save time clicking to remove exactly the same description that already has been approved to be removed. For example:
But the script can save time if the action would be:
It means that if the script find again "Grade II listed building in Powys.", it isn't going to ask for the action to perform, neither quit or skip, it would be removed instantly.
This new action would save time and improve the efficiency of the script, but it couldn't be used with all the descriptions, because it would overload the system for nothing. It has to be used for a specific kind of descriptions, like the one mentioned.
How would it work
The script would have a new action: Remove identical full stops automatically. This new action will add the description to a CSV, previously created and loaded with only one column named
sentence
. But, before to add it the script has to confirm if the description is already in the CSV: if it is in the CSV, it isn't added, if it isn't, it is added.This descriptions saved in this CSV would be useful for the next times the script would be run, because the script would read this document, which would storage the old descriptions marked to find and remove automatically and the new ones.
Tasks
The text was updated successfully, but these errors were encountered: