Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation or tutorials? #40

Closed
hgandhi2411 opened this issue May 9, 2021 · 11 comments
Closed

Documentation or tutorials? #40

hgandhi2411 opened this issue May 9, 2021 · 11 comments

Comments

@hgandhi2411
Copy link

hgandhi2411 commented May 9, 2021

Dr. Ouyang,

Is it possible to create documentation or maybe a tutorial explaining different parameters that can be used to run this SISSO code for different tasks? I'm a new FORTRAN user and have been trying to reproduce your regression example. I was finally able to make it work with difficulty. It might be nice to have an end-to-end tutorial for new users like me and will encourage a larger community to use this excellent code.

@rouyang2017
Copy link
Owner

Thanks hgandhi2411. Yes, I also feel it is increasingly necessary to create a user's guide. Will do it this year.

@hgandhi2411
Copy link
Author

hgandhi2411 commented May 10, 2021

In the mean time, can you please help me understand what's wrong with this input script? I get a segmentation fault. I'm attaching SISSO.in and train.dat. I have also tried with dimclass=(1:3) and still get a seg fault.

image
image

Error:

>>> mpirun -n 1 SISSO > log

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
SISSO              000000000049002A  Unknown               Unknown  Unknown
libpthread-2.17.s  00007F50C8C8C630  Unknown               Unknown  Unknown
SISSO              000000000047DA08  Unknown               Unknown  Unknown
SISSO              0000000000404EE2  Unknown               Unknown  Unknown
libc-2.17.so       00007F50C85CF555  __libc_start_main     Unknown  Unknown
SISSO              0000000000404DE9  Unknown               Unknown  Unknown

@rouyang2017
Copy link
Owner

rouyang2017 commented May 10, 2021

Could you please remove in the SISSO.in the two operators (sinh)(tanh) which are currently not implemented in the code and try it again?

@hgandhi2411
Copy link
Author

hgandhi2411 commented May 10, 2021

I tried removing (sinh)(tanh) as you suggested and still get a seg fault as above. It happens almost instantly.

@rouyang2017
Copy link
Owner

rouyang2017 commented May 10, 2021

OK, let me give a try. Could you post your input as text, instead of images, so that I can copy your data?

@hgandhi2411
Copy link
Author

hgandhi2411 commented May 10, 2021

Here is the text version of my files:
SISSO.in

!_________________________________________________________________
! keywords for the target properties                               
!_________________________________________________________________
ptype=1                               
ntask=1                               
nsample=25                            ! number of samples for each task
task_weighting=1
desc_dim=4                            ! dimension of the descriptor  
restart=.false.                       ! set .true. to continue a job that was stopped but not yet finished 
!_________________________________________________________________
!keywords for feature construction and sure independence screening 
!_________________________________________________________________
nsf=3                                 ! number of scalar features (one feature is one number for each material)
rung=2                                ! rung (<=3) of the feature space to be constructed (times of applying the opset recursively)
opset='(+)(-)(*)(/)(exp)(^-1)(sin)(cos)'      ! (sinh)(tanh)'
maxcomplexity=10                      ! max feature complexity (number of operators in a feature)
dimclass=(1:1)(2:2)(3:3)                    ! group features according to their dimension/unit; those not in any () are dimensionless
maxfval_lb=1e-3                       ! features having the max. abs. data value < maxfval_lb will not be selected 
maxfval_ub=1e5                        ! features having the max. abs. data value > maxfval_ub will not be selected
subs_sis=100                          ! size of the SIS-selected (single) subspace for each descriptor dimension
!_________________________________________________________________
!keywords for descriptor identification via a sparsifying operator
!_________________________________________________________________
method='L0'                           ! sparsification operator: 'L1L0' or 'L0'; L0 is recommended!
fit_intercept=.false.                 ! fit to a nonzero intercept (.true.) or force the intercept to zero (.false.)
metric='RMSE'                         ! for regression only, the metric for model selection: RMSE,MaxAE
nm_output=50                          ! number of the best models to output

train.dat

materials	del_P_by_L	pipe_D	elbow_angle	inlet_v
sample1	0.77962398	0.021	1.0	0.011
sample2	0.12539223	0.055	9.0	0.011
sample3	0.31596343	0.049	13.0	0.02
sample4	1.548262	0.022	20.0	0.02
sample5	0.22665419	0.084	47.0	0.02
sample6	2.1042648	0.01	48.0	0.006
sample7	0.049643078	0.079	49.0	0.006
sample8	1.0513706	0.013	50.0	0.005
sample9	1.5713076	0.032	53.0	0.025
sample10	0.20413389	0.053	57.0	0.01
sample11	0.092338423	0.1	71.0	0.011
sample12	0.62038283	0.054	83.0	0.021
sample13	1.2336624	0.014	84.0	0.006
sample14	0.054440061	0.083	84.0	0.006
sample15	0.091553684	0.069	92.0	0.007
sample16	3.1905766	0.014	93.0	0.013
sample17	0.065200976	0.075	99.0	0.006
sample18	0.56094855	0.067	106.0	0.025
sample19	0.56709911	0.038	111.0	0.013
sample20	0.36515745	0.075	112.0	0.021
sample21	1.5852126	0.027	129.0	0.018
sample22	0.9067224	0.021	131.0	0.008
sample23	0.50116911	0.067	153.0	0.022
sample24	0.13620762	0.064	159.0	0.008
sample25	0.11003728	0.097	180.0	0.011

@rouyang2017
Copy link
Owner

rouyang2017 commented May 11, 2021

I see the problem. In the file train.dat, you have many Tab symbols which can not be identified in the SISSO code. It works when I replace all the Tab with space symbols.

@hgandhi2411
Copy link
Author

That worked! Thank you, Dr. Ouyang.
This may be a naive question, but can you tell me how to interpret the below output. I understand that in the final model the coefficients must be multiplied by the different descriptors. However, how do I interpret [(exp()/(pipe_D*elbow_angle inlet_v ))] for example, what does exp() mean here and there is no operator between elbow_angle and inlet_v ?
I think that there is a possibility that the columns have not been read correctly, but not sure.

iteration:   4
--------------------------------------------------------------------------------
FC starts ...
File containing the features to be rejected: feature_space/Uspace.name
Total number of features in the space phi00:              3
Total number of features in the space phi01:             23
Total number of features in the space phi02:            863
Size of the SIS-selected subspace from phi02:        100
Wall-clock time (second) for this FC:            0.01
FC done!

DI starts ...
total number of SIS-selected features from all iterations:        400
L0 starts ...

Final model/descriptor to report
================================================================================
  4D descriptor (model):
Total RMSE,MaxAE:   0.043247  0.117872
@@@descriptor:
                      2:[((/pipe_D)/pipe_D)]
                     15:[(exp()*(/pipe_D))]
                    228:[(exp()/(pipe_D*elbow_angle     inlet_v ))]
                    295:[((pipe_D/elbow_angle   inlet_v )/(*elbow_angle inlet_v ))]
       coefficients_001:     0.3839017522E-01    0.1011939345E+01   -0.3901951490E+00    0.9466491711E+01
          Intercept_001:     0.0000000000E+00
         RMSE,MaxAE_001:     0.4324744081E-01    0.1178723437E+00
================================================================================

@rouyang2017
Copy link
Owner

That seems a code bug. Are you using an early version of the code?

If I am using the version 3.0.2, then I got normal results:
4D descriptor (model):
Total RMSE,MaxAE: 0.043247 0.117872
@@@descriptor:
2:[((inlet_v/pipe_D)/pipe_D)]
15:[(exp(inlet_v)(inlet_v/pipe_D))]
228:[(exp(inlet_v)/(pipe_D
elbow_angle))]
295:[((pipe_D/elbow_angle)/(inlet_v*elbow_angle))]
coefficients_001: 0.3839017519E-01 0.1011939347E+01 -0.3901951487E+00 0.9466491704E+01
Intercept_001: 0.0000000000E+00
RMSE,MaxAE_001: 0.4324744046E-01 0.1178723436E+00

@hgandhi2411
Copy link
Author

I am also using Version SISSO 3.0.2, June 2020. I found out, there was more than one space between my columns (I replaced all tabs with spaces but had more than spaces to format the file to look good) and I guess the program doesn't like that. After I removed all extra spaces, I get the same equation as you! Thank you so much for being patient with my questions!!

@rouyang2017
Copy link
Owner

The number of spaces between columns does not matter, so it may be due to other reasons.
Anyway, good to know it works now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants