Skip to content

Chi-Square Identified Significant Groups, Now Which One is More Significant? #404

@turkalpmd

Description

@turkalpmd

Dear @raphaelvallat,

In my thesis work, I examined the relationships between two main categories of multiple categorical variables and reached a point where the analyses I conducted started to feel meaningless. For example, I investigated the categorical relationship between four different age groups and eleven different diagnostic groups using the pd.get_dummies method and found many significant results. However, using Bonferroni adjustment to evaluate these seemed too simplistic for the complexity of categorical tests I was dealing with. Therefore, I sought a posthoc analysis for categorical tests and stumbled upon this wonderful article which solved my problems.

Ransacking is a post-hoc analysis technique used in statistical analyses, especially after Chi-Square tests, to identify specific 2x2 tables of interest within a large r x c contingency table and evaluate the statistical significance of these smaller tables. Essentially, the ransacking method focuses on specific 2x2 tables created by selecting certain cells from a contingency table, and assessing how these smaller tables reveal specific relationships within the larger table. This method is often used to understand complex relationships in large tables, as the overall result of a Chi-Square test can stem from interactions among multiple variables, and determining which pairs of variables contribute most to the result can be challenging.

Ransacking begins with the creation of a relevant 2x2 table. Then, it compares the odds or probabilities within this table and calculates an odds ratio or log odds ratio. This evaluates the strength and direction of the relationship between specific cells. The calculated odds ratio is then used to determine whether the null hypothesis of independence (whether there is a relationship between the two variables) is rejected or not. This method is particularly valuable for understanding specific interactions between variables and how these interactions contribute to the overall results of the Chi-Square test.

In conclusion, I want to highlight that traditional 2x2 contingency tests for Chi-Square analyses do not fully meet my needs, and logistic regression is not entirely suitable for my problem. Hence, I emphasize the importance of post-hoc analysis methods like ransacking and the ability of these methods to rank relationships between groups. I particularly appreciate the ransacking method and have also implemented other adjustment methods found in the article in Python. I aim to add these post-hoc tests to the Pingouin library. This addition will expand the library's capability to perform more detailed analyses after Chi-Square tests, allowing users to discover more specific relationships.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions