Skip to content

Block size, block size statistics and distributions #1103

Answered by RossKen
illeamb asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @illeamb, we don't have anything that gives you summary stats for blocking out of the box, unfortunately.

There is a function that works in the background of the blocking rules chart that may be of use. Using the data from that tutorial:

from splink.analyse_blocking import cumulative_comparisons_generated_by_blocking_rules

blocking_rules = [blocking_rule_1, blocking_rule_2]

cumulative_comparisons_generated_by_blocking_rules(linker,blocking_rules)

Gives

[{'row_count': 473,
  'rule': 'substr(l.first_name,1,1) = substr(r.first_name,1,1) and l.surname = r.surname',
  'cumulative_rows': 473,
  'cartesian': 499500,
  'reduction_ratio': 'The rolling reduction ratio with your given blocking…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by RossKen
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants