Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature dimensions #41

Closed
hgandhi2411 opened this issue May 24, 2021 · 6 comments
Closed

Feature dimensions #41

hgandhi2411 opened this issue May 24, 2021 · 6 comments

Comments

@hgandhi2411
Copy link

Dr. Ouyang,

Can you please explain how the feature dimensions work? In the input script, suppose I have dimclass=(1:2)(3:4)(5:5), does this assign dimensions as (mass, length, time) or does this just group features an assumes they have same dimensions?

My data looks as follows and I have also added their units here. How would I make sure that equations given by SISSO are dimensionally consistent? I'm currently using dimclass=(1:2)(3:3) and the equations generated don't seem right.

Materials Del_P Pipe_D (m) Inlet_V (m/s) angle (deg)
sample1 24.937540 0.005 0.020 1.0
sample2 23.688087 0.005 0.019 2.0
sample3 22.438908 0.015 0.018 1.0
sample4 21.190007 0.025 0.017 4.0
sample5 19.941388 0.007 0.016 5.0
... ... ... ... ...
@rouyang2017
Copy link
Owner

rouyang2017 commented May 25, 2021

If you set dimclass=(1:2)(3:4)(5:5), then it means in your train.dat file the features from 1st to 2nd are of the same unit and the two can be linearly combined. Likewise, (3:4) means the 3rd and 4th features have the same unit, and (5:5) means the 5th feature have other different unit. This grouping of features is to exclude unreasonable linear combinations, such as mass + length, and analysis will be applied throughout the construction of feature space. Note that in the final output model y=sum(cx), we assume the coefficients c carry units so that all terms cx have the same one unit with the target y, and thus features x with any units can appear in the linear model.

In your example, if you set dimclass=(1:2)(3:3), it means Pipe_D and Inlet_V have the same unit (which is not correct), and angle has another unit. Thus, you should set dimclass=(1:1)(2:2)(3:3). If you want Inlet_V to be dimensionless, then please do it dimclass=(1:1)(3:3), just exclude that in any round bracket.

@hgandhi2411
Copy link
Author

hgandhi2411 commented Jun 8, 2021

Dr. Ouyang, is it possible to express derived units using the dimclass variable? For example, if in my train.dat I have feature columns for mass, length and density, can density's units be expressed as mass/(length)^3? What's the best way of going about this?

@rouyang2017
Copy link
Owner

That is not implemented in current code, but you can change the code. Assuming you have three features in the train.dat file, arranged as feature1(unit: mass) feature2(unit: length) feature3(unit: mass/(length)^3), then in the file SISSO.f90, inserting the following lines right after the line " call read_para_b ":

pfdim(:,1)=(/1.0, 0.0/) # unit-vector for the 1st feature (assuming it is mass)
pfdim(:,2)=(/0.0, 1.0/) # unit-vector for the 2nd feature (assuming it is length)
pfdim(:,3)=(/1.0, -3.0/) # unit-vector for the 3rd feature (mass/(length)^3)

Recompile the code and it should work. Please check the output in SISSO.out to confirm this.

@hgandhi2411
Copy link
Author

hgandhi2411 commented Aug 5, 2021

Prof. Ouyang, your suggestion worked well for my project. This is hard coded. So, I was wondering what would be the easiest way to make this a user input in FORTRAN, to directly take in pfdim matrix so they can group features as they wish?

@rouyang2017
Copy link
Owner

Thanks. Will make this happen.

@pmiam
Copy link

pmiam commented Dec 20, 2022

For anyone else trying to use the feature_units file to designate the derived units of a predictor variables as described in this thread, take note that it is necessary to have at least as many opening parenthesis "(" in the funit string as you have basis units in your file.

for example

feature_units head containing 3 dimensions. one with length units, one unitless, one with density units

1 0 0 0 0 0
0 0 0 0 0 0 
0 1 -1 0 0 0

then, in SISSO.in, write funit

funit=(L)(m)(V)(E)(mol)(T)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants