[Proposal] Automatic Algorithm Recommendation #16

kunwuz · 2025-04-06T17:44:43Z

Candidate 1: Automatically recommend methods based on the data analysis.

To avoid additional efforts/choices on the user side, we may just consider the following factors:

Data types

Continuous
Discrete
Mixed

Definitely want a unique DAG, or allow some directions to be undetermined

If some undirected edges are okay, then PC, FCI, and GES.

Missing value

If missing value, use MV-PC

Sample size & number of variables

This is mainly for KCI and Generalized Score

If < 3000 samples, use KCI and Generalized Score
If >= 3000 samples, use fastKCI or RCIT (PR link) to replace KCI
- For Generalized Score, currently there is no scaled-up version, so we could just suggest GRaSP or BOSS
In general, if we have very large datasets, say >100 variables, suggest GRaSP or BOSS
- They are both score-based methods, which can be combined with different score functions
- They are scalable
We could always suggest GRaSP or BOSS as an option to make the running faster.

Is IID or not

If not, use VAR-LiNGAM, CD-NOD, or Granger causality

We do not consider the assumptions on the data distributions for this tab, since it will cause some unnecessary concerns.

For example, we may recommend PC with FisherZ as the top choice by default, given that linear methods are usually good at balancing between accuracy and complexity.

We do not want users to feel that all the parametric methods are not reliable. If we explicitly require the algorithm to match the distribution, users may always choose nonparametric methods, such as PC with KCI and GES with generalized score, which usually perform badly regarding scalability.

Candidate 2: Recommendation based on questions

The recommendation is based on the two flowcharts (see below). These could be three questions (in order):

Are there hidden variables?
Can we treat discrete variables as continuous?
- If not, recommend those work only for discrete data
What do you believe the data should be:
- Follow the flowcharts, recommend based on the answer (e.g., linear gaussian, linear nongaussian, etc.)

Also, give our recommendation based on LLM analysis of the data:

“It seems that you are working on data. Together with your previous answers, we recommend ...”

Give a list of algorithms, and add notes on them

E.g., KCI tests may take a long time, consider fastKCI or RCIT if needed...

v-shaal self-assigned this Apr 6, 2025

MantejGill added the enhancement New feature or request label Apr 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Automatic Algorithm Recommendation #16

[Proposal] Automatic Algorithm Recommendation #16

kunwuz commented Apr 6, 2025 •

edited

Loading

[Proposal] Automatic Algorithm Recommendation #16

[Proposal] Automatic Algorithm Recommendation #16

Comments

kunwuz commented Apr 6, 2025 • edited Loading

Candidate 1: Automatically recommend methods based on the data analysis.

Data types

Definitely want a unique DAG, or allow some directions to be undetermined

Missing value

Sample size & number of variables

Is IID or not

Candidate 2: Recommendation based on questions

kunwuz commented Apr 6, 2025 •

edited

Loading