Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-physical features with feature_unit #52

Closed
ajsummers opened this issue Jun 22, 2023 · 4 comments
Closed

Non-physical features with feature_unit #52

ajsummers opened this issue Jun 22, 2023 · 4 comments

Comments

@ajsummers
Copy link

ajsummers commented Jun 22, 2023

Hi Dr. Ouyang,

I want to first thank you for this great work - this is a very powerful idea and code, and I'm excited to see its full potential in the future. I've run into a small issue with the code, and I'm hoping you can help me figure out how to fix this. Thank you in advance for your help.


NOTE: I updated to the most recent version to this date (ea96f46)

I've been trying some sample datasets with SISSO, and I think I've run into a problem where SISSO seems to suggest features that don't have physical meaning - the units do not match.

The dataset comes from UCI-Airfoil_Self-Noise. The features are below:

  • Frequency (Hz) - "freq"
  • Angle of attack (deg) - "angle"
  • Chord length (m) - "chord"
  • Free-stream velocity (m/s) - "fsv"
  • Suction side displacement thickness (m) - "ssdt"
  • Scaled sound pressure level (dB) - "sound" (output)

Here are the first few lines of the "train.dat" file:

samples sound freq angle chord fsv ssdt
0 126.201 800 0.0 0.3048 71.3 0.00266337
1 125.201 1000 0.0 0.3048 71.3 0.00266337
2 125.951 1250 0.0 0.3048 71.3 0.00266337
3 127.591 1600 0.0 0.3048 71.3 0.00266337

And here is the feature_unit file I've been using (columns correspond to m, s, deg):

0 -1 0
0 0 1
1 0 0
1 -1 0
1 0 0

I've formatted these in markdown for this post, but of course they are tab-delimited.

Looking at the "Uspace.expressions" file, I see some features that do not seem to have physical meaning (I have the first few lines below):

  • (sqrt(ssdt)*(freq*chord)) corr= 0.7020
  • ((freq*ssdt)*(chord-ssdt)) corr= 0.6955
  • ((freq/fsv)*sqrt(ssdt)) corr= 0.6829
  • ((freq-fsv)*(ssdt*chord)) corr= 0.6809
  • ((angle-freq)*(ssdt*chord)) corr= 0.6795

For example, freq [1/s] - fsv [m/s] does not make physical sense, and neither does angle [deg] - freq [1/s]. Am I doing this wrong, or is this a unique case?


Please let me know if you need any more information. I look forward to hearing back from you, and thank you in advance for your help.

@ajsummers
Copy link
Author

I may have found the solution by poking around a bit. After seeing this comment on a previous issue, I changed the title of feature_unit to feature_units and also added the following line to my SISSO.in file:
funit=(m)(s)(deg)

This seems to solve feature unit issues with other datasets as well.


I want to suggest the following additions/changes to the SISSO_Guide.pdf (pages 8-9):

  • Change "feature_unit" to "feature_units"
  • Explain that when using feature_units file, one must define the funit parameter so that the number of parenthetical arguments is equal to the number of columns in the feature_unit file

Waiting for @rouyang2017 to confirm that this is correct before closing issue.

@rouyang2017
Copy link
Owner

rouyang2017 commented Jun 23, 2023

Thank you Alex Summers.

The code does check if a file named 'feature_units" exist. I am sorry for the missing 's' in the User Guide. I will correct the error and make clearer explanation to this file.

So, I guess your unit information in the 'feature_unit' was not successfully read in, which may be the reason for the non-physical features.

If the keyword 'funit' was not given a string, all the input features will be treated as dimensionless quantities. If the file "feature_units" exist, the unit information for all the input features will be read in and overwrite the 'funit'. You can look into the SISSO.out to double check if the unit information are correctly read in.

@ajsummers
Copy link
Author

Dr. Ouyang,

I'm currently using funit=(m)(s)(deg) for the dataset described above. If I remove this line and keep feature_units the same, I get the following output in SISSO.out:

...
Maximal feature complexity (number of operators in a feature):        3
Units of the input primary features (each represented by a vector):





The feature will be discarded if the minimum of the maximal abs. value in it <    0.10000E-02
The faature will be discarded if the maximum of the maximal abs. value in it >     0.10000E+06
...

Also, the units in Uspace.expressions are non-physical. If I add the funit line back to the SISSO.in file, I get the following output in SISSO.out:

...
Maximal feature complexity (number of operators in a feature):        3
Units of the input primary features (each represented by a vector):
  0.00 -1.00  0.00
  0.00  0.00  1.00
  1.00  0.00  0.00
  1.00 -1.00  0.00
  1.00  0.00  0.00
The feature will be discarded if the minimum of the maximal abs. value in it <    0.10000E-02
The faature will be discarded if the maximum of the maximal abs. value in it >     0.10000E+06
...

And the features in Uspace.expressions make physical sense. I have tried this workflow for a dataset with ~80 initial features, and when I remove funit and keep feature_units for this dataset, the number of blank lines appears to be ~80 lines.

Are you able to reproduce this, or is this just occurring on my end?

@rouyang2017
Copy link
Owner

rouyang2017 commented Jun 24, 2023

I see the problem, thanks. You are right, I will adopt your suggestions:

  • Change "feature_unit" to "feature_units"
  • Explain that when using feature_units file, one must define the funit parameter so that the number of parenthetical arguments is equal to the number of columns in the feature_unit file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants