Remove fullstop automatically when it finds the same description #10

ivanhercaz · 2019-01-15T03:14:40Z

This idea is to save time clicking to remove exactly the same description that already has been approved to be removed. For example:

"Grade II listed building in Powys." [action: Remove full stop]
But the script can save time if the action would be:
"Grade II listed building in Powys." [action: Remove identical full stops automatically]
It means that if the script find again "Grade II listed building in Powys.", it isn't going to ask for the action to perform, neither quit or skip, it would be removed instantly.

This new action would save time and improve the efficiency of the script, but it couldn't be used with all the descriptions, because it would overload the system for nothing. It has to be used for a specific kind of descriptions, like the one mentioned.

How would it work

The script would have a new action: Remove identical full stops automatically. This new action will add the description to a CSV, previously created and loaded with only one column named sentence. But, before to add it the script has to confirm if the description is already in the CSV: if it is in the CSV, it isn't added, if it isn't, it is added.

This descriptions saved in this CSV would be useful for the next times the script would be run, because the script would read this document, which would storage the old descriptions marked to find and remove automatically and the new ones.

Tasks

Development of the action and the requirements to work.
System to avoid to add duplicated descriptions in the CSV.
Make easy to generate the CSV file in order to follow the commented in Make easier to reuse the code #5.
Test several times.

The text was updated successfully, but these errors were encountered:

ivanhercaz · 2019-02-01T00:48:01Z

But... how useful is to find and delete the exact description? It might be more effective if the system adds to the csv the last word with the full stop. Keeping this in mind, this action could be inserted in the first option ("Remove full stop"), avoiding to create another option. With the actual way the script would make the next steps:

Grade II listed building in Newport City. Located approximately 40 metres SW of Pound-wern Cottage. Bridge carries footpath connecting Ridgeway with the canal towpath.

Remove duplicates automatically.
Delete the full stop in the current item and add it to the csv file.
If it finds the exact description again, it will delete it.

In this way, the script won't delete the description if something differs from the one added. But, if we save the last word and not all the description, the script would make:

Remove full stop.
Delete the full stop in the current item and add the last word, towpath, to the csv file.
If the script finds any other descriptions that ends with towpath., it will delete it.

With this we save time because:

We have not to think in more options (remove, checklist, edit, skip). The script would add the last word with the basic instruction to remove the full stop.
The script would be more "intelligent" as we use it because it would add all the last words with full stops that we consider necessary to remove. In addition, the script will skip, as it does now, the type of words/full stops added in the exceptions list.

Thus, over time it would work with less and less human intervention.

Of course, another things to keep in mind:

Should the last words be organized in different documents according to its language (last_words_en.csv, last_words_es.csv)? Or should all the words in a multilingual document?
...

@davidabian, I know all this issue is very long, specifically this last comment, but I would like to know your opinion about the reformulation of the system to save time (and make CanaryBot more "intelligent" 🐦 ). Of course, thanks in advance!

davidabian · 2019-02-02T12:20:50Z

This is more of a linguistic issue and I can't expect what the results will be in the languages I don't know deeply; I guess in some languages this will cause too many false positives to be considered, while in Spanish or English this can work (but only if the bot is operated carefully, since a single mistake by the bot operator could be spread to several unrelated entities).

ivanhercaz · 2019-02-02T14:49:04Z

@davidabian, at this time all the full stops need to be confirmed to be removed from the description. The criterion to remove it is to be sure that it isn't an abbreviation or something that needs to have the full stop. In addition to that exists the exceptions lists, with it the script avoid the descriptions that match the pattern of any exception.

What kind of human mistakes do you refer? Mark a full stop to remove when it is part of an abbreviation because then the script would remove automatically? If this is the kind of mistakes do you mean, of course, the operator needs to be sure of what is doing.

In the case of the languages I follow the same rule than in the other: Is or isn't the full stop necessary? Of course, if the operator has a doubt in any of the language in which the script work, the operator has the option "Add description to checklist". Then, the operator might review the checklist to ask in Wikidata what could be the good option to choose, or in the case it is part of an abbreviation, create another regex in the exception list.

ivanhercaz added the enhancement New feature or request label Jan 15, 2019

ivanhercaz added a commit that referenced this issue Jan 15, 2019

task to generate the CSV for duplicated descriptions (#10)

67f4895

ivanhercaz added a commit that referenced this issue Jan 16, 2019

first outline for the 'remove duplicated description' (#10)

5c54b9c

ivanhercaz mentioned this issue Feb 1, 2019

Delete duplicated description #14

Merged

ivanhercaz mentioned this issue Feb 2, 2019

"Removes the duplicates automatically" enter in a loop of actions #15

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove fullstop automatically when it finds the same description #10

Remove fullstop automatically when it finds the same description #10

ivanhercaz commented Jan 15, 2019 •

edited

Loading

ivanhercaz commented Feb 1, 2019

davidabian commented Feb 2, 2019

ivanhercaz commented Feb 2, 2019

Remove fullstop automatically when it finds the same description #10

Remove fullstop automatically when it finds the same description #10

Comments

ivanhercaz commented Jan 15, 2019 • edited Loading

How would it work

Tasks

ivanhercaz commented Feb 1, 2019

davidabian commented Feb 2, 2019

ivanhercaz commented Feb 2, 2019

ivanhercaz commented Jan 15, 2019 •

edited

Loading