You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Candidate 1: Automatically recommend methods based on the data analysis.
To avoid additional efforts/choices on the user side, we may just consider the following factors:
Data types
Continuous
Discrete
Mixed
Definitely want a unique DAG, or allow some directions to be undetermined
If some undirected edges are okay, then PC, FCI, and GES.
Missing value
If missing value, use MV-PC
Sample size & number of variables
This is mainly for KCI and Generalized Score
If < 3000 samples, use KCI and Generalized Score
If >= 3000 samples, use fastKCI or RCIT (PR link) to replace KCI
For Generalized Score, currently there is no scaled-up version, so we could just suggest GRaSP or BOSS
In general, if we have very large datasets, say >100 variables, suggest GRaSP or BOSS
They are both score-based methods, which can be combined with different score functions
They are scalable
We could always suggest GRaSP or BOSS as an option to make the running faster.
Is IID or not
If not, use VAR-LiNGAM, CD-NOD, or Granger causality
We do not consider the assumptions on the data distributions for this tab, since it will cause some unnecessary concerns.
For example, we may recommend PC with FisherZ as the top choice by default, given that linear methods are usually good at balancing between accuracy and complexity.
We do not want users to feel that all the parametric methods are not reliable. If we explicitly require the algorithm to match the distribution, users may always choose nonparametric methods, such as PC with KCI and GES with generalized score, which usually perform badly regarding scalability.
Candidate 2: Recommendation based on questions
The recommendation is based on the two flowcharts (see below). These could be three questions (in order):
Are there hidden variables?
Can we treat discrete variables as continuous?
If not, recommend those work only for discrete data
What do you believe the data should be:
Follow the flowcharts, recommend based on the answer (e.g., linear gaussian, linear nongaussian, etc.)
Also, give our recommendation based on LLM analysis of the data:
“It seems that you are working on data. Together with your previous answers, we recommend ...”
Give a list of algorithms, and add notes on them
E.g., KCI tests may take a long time, consider fastKCI or RCIT if needed...
The text was updated successfully, but these errors were encountered:
Candidate 1: Automatically recommend methods based on the data analysis.
To avoid additional efforts/choices on the user side, we may just consider the following factors:
Data types
Definitely want a unique DAG, or allow some directions to be undetermined
Missing value
Sample size & number of variables
This is mainly for KCI and Generalized Score
Is IID or not
We do not consider the assumptions on the data distributions for this tab, since it will cause some unnecessary concerns.
For example, we may recommend PC with FisherZ as the top choice by default, given that linear methods are usually good at balancing between accuracy and complexity.
We do not want users to feel that all the parametric methods are not reliable. If we explicitly require the algorithm to match the distribution, users may always choose nonparametric methods, such as PC with KCI and GES with generalized score, which usually perform badly regarding scalability.
Candidate 2: Recommendation based on questions
The recommendation is based on the two flowcharts (see below). These could be three questions (in order):
Also, give our recommendation based on LLM analysis of the data:
Give a list of algorithms, and add notes on them
The text was updated successfully, but these errors were encountered: