- 
                Notifications
    You must be signed in to change notification settings 
- Fork 230
Open
Description
Hi, I am trying to understand what the alternative rules for resolving orientation conflicts in the PC algorithm do and how the procedure is justified (utils -> PCUtils).
Here are my questions:
- Why are we sorting in ascending order? Is there a paper / source / documentation where these rules and their justifications are presented?
- Is there a typo in p_{xy|not y}? Shouldn't it be the p-value (now descending order) of the CI test X and Z given Y looking at the implementation?
I think the author of the method is @jdramsey, thank you so much for taking a look!
In particular, I am looking at rules 3 and 4 (prioritizing stronger colliders):
if priority == 3:  # 3. Order colliders by p_{xz|y} in ascending order
            for (x, y, z) in R0:
                cond = cg_new.find_cond_sets_with_mid(x, z, y)
                UC_dict[(x, y, z)] = max([cg_new.ci_test(x, z, S) for S in cond])
            UC_dict = sort_dict_ascending(UC_dict)
        else:  # 4. Order colliders by p_{xy|not y} in descending order
            for (x, y, z) in R0:
                cond = cg_new.find_cond_sets_without_mid(x, z, y)
                UC_dict[(x, y, z)] = max([cg_new.ci_test(x, z, S) for S in cond])
            UC_dict = sort_dict_ascending(UC_dict, descending=True)
Here is my understanding of what is being implemented. I am happy to be corrected!
- The description of find_cond_sets_with_mid(self, i: int, j: int, k: int) -> List[Tuple[int]] says it "return[s] the list of conditioning sets of the neighbors of i or j in adjmat which contains k", so we are finding subsets of neighbors of x and z which contain y.
- We then create a dictionary or the CI test results given these subsets and take the maximum over the p-values given the different conditioning sets. A large p-value means that we accept our hypothesis 'conditional independence', that means if we take the largest p-value we sort be how independent we think our variables are given the conditioning set that contains y.
- Last, we sort the triples by p-values in ascending order. Later in the function, we iterate through this dictionary and orient edges only if they have not been oriented. In my understanding, that means, we now prioritize low p-values which to me seems inconsistent to maximizing over p-values in the step before and the idea that we want to orient colliders if x and z are independent given y (i.e. a large p value for the test, not a small one).
Metadata
Metadata
Assignees
Labels
No labels