-
The comparison numbers generated for different blocking rules are really nice for example from the tutorial: blocking_rule_1 = "substr(l.first_name,1,1) = substr(r.first_name,1,1) and l.surname = r.surname" Is there a way to get some information on the block size and some block size statistics (i.e min block size, max block size, average block size, distribution of the blocks etc.) easily? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi @illeamb, we don't have anything that gives you summary stats for blocking out of the box, unfortunately. There is a function that works in the background of the blocking rules chart that may be of use. Using the data from that tutorial:
Gives
Which gets you part of the way there. It does feel like this could be a useful addition so I will have raised an issue #1106. I can't guarantee we will get round to it anytime soon so if you make any progress with it please let us know and it would be great to include anything you create into splink! |
Beta Was this translation helpful? Give feedback.
Hi @illeamb, we don't have anything that gives you summary stats for blocking out of the box, unfortunately.
There is a function that works in the background of the blocking rules chart that may be of use. Using the data from that tutorial:
Gives