You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am implementing a Python version of the library for my own use-case - https://github.com/anirudhgangwal/ukpostcodes. The library mimics functionalities available here, including lookup in ONS database (but I don't use a DB/api to postcode.io, just have a set of ~1.8M postcodes).
We parse postcodes from OCR output and the "O" and "I" errors account for almost all our errors. The fix implemented here was helpful in reducing our error significantly. However, I want to understand if there was a reason to not expand this auto-correct further.
Lets take the example of a 3 digit outcode. This can take the following forms: A9A 9AA A99 9AA AA9 9AA
Since the second and third characters can take on both letters or numbers, this library currently only coerces for "L??".
I think there is a possibility to add a new function, or a parameter to function, which returns a list. E.g.
deffix_with_options(s: str) ->List[str]:
"""Attempts to fix a given postcode, covering all options. Args: s (str): The postcode to fix Returns: str: The fixed postcode """ifnotFIXABLE_REGEX.match(s):
returnss=s.upper().strip().replace(r"\s+", "")
inward=s[-3:].strip()
outward=s[:-3].strip()
outcode_options=coerce_outcode_with_options(outward)
return [
f"{coerce_outcode(option)}{coerce_incode(inward)}"foroptioninoutcode_options
]
defcoerce_outcode_with_options(i: str) ->List[str]:
"""Coerce outcode, but cover all possibilities"""iflen(i) ==2:
return [coerce("LN", i)]
eliflen(i) ==3:
outcodes= []
ifis_valid_outcode(outcode:=coerce("LNL", i)):
outcodes.append(outcode)
ifis_valid_outcode(outcode:=coerce("LNN", i)):
outcodes.append(outcode)
ifis_valid_outcode(outcode:=coerce("LLN", i)):
outcodes.append(outcode)
returnlist(set(outcodes))
eliflen(i) ==4:
outcodes= []
ifis_valid_outcode(outcode:=coerce("LLNL", i)):
outcodes.append(outcode)
ifis_valid_outcode(outcode:=coerce("LLNN", i)):
outcodes.append(outcode)
returnlist(set(outcodes))
else:
return [i]
This reduced our error rate further down (significantly as most errors were with misreading 0). Note for our use case did made sense as after checking with ONS directory there were negligible false positives.
The text was updated successfully, but these errors were encountered:
Hey @anirudhgangwal it is nice approach but to implement to our lib we will need to break our interface pattern to return array of possible fixes when this is not indent for this simple lib. We see possible use cases for array but this lib is intend to just fix numeric mistake and return generally valid postcode.
A9A 9AA
A99 9AA
AA9 9AA
All of those are valid postcodes in it's construction. So our lib just trying to fix those not matching it so pattern L?? is sufficient to cover all of those. If your intend is to use it after for check in db your version will give you less errors and additional possibilities of fixes which is great!
I am implementing a Python version of the library for my own use-case - https://github.com/anirudhgangwal/ukpostcodes. The library mimics functionalities available here, including lookup in ONS database (but I don't use a DB/api to postcode.io, just have a set of ~1.8M postcodes).
We parse postcodes from OCR output and the "O" and "I" errors account for almost all our errors. The fix implemented here was helpful in reducing our error significantly. However, I want to understand if there was a reason to not expand this auto-correct further.
Lets take the example of a 3 digit outcode. This can take the following forms:
A9A 9AA
A99 9AA
AA9 9AA
Since the second and third characters can take on both letters or numbers, this library currently only coerces for
"L??"
.I think there is a possibility to add a new function, or a parameter to function, which returns a list. E.g.
A quick Python implementation looked like this:
This reduced our error rate further down (significantly as most errors were with misreading 0). Note for our use case did made sense as after checking with ONS directory there were negligible false positives.
The text was updated successfully, but these errors were encountered: