BTCPrediction contains scripts and data for BTC price prediction. Currently, we are experimenting with FFX and MOSES.
The FFX director contains a python script called ruffx.py and FFX style formatted training and test data. The script runs FFX and saves best models in a csv file called ffx_results.csv and the plots for each predictions vs actual in png format.
The MOSES directory contains MOSES style training data generation script and analysis script which runs MOSES with a certain parameter setup and evaluation and scoring script for evaluating the MOSES generated Model. The MOSES data generator script is called transformer.py which requires a certain arguments including a config file. The default one is called experiment-config.json. It is not adviced to run this script multiple times with the configuration depicted in experiment-config.json as it may require lots of CPU time to generate all sorts of combination of bins. Instead you may run it only once and manually invoke run_anal.sh.
After manually tuning important parameter, I fixated on using the following settings for the entire experiment.
moses -H pre -q 0.3 -w 0.8 -u $OUT -W1 -i $train_file
-m 30000 \
--fs-score pre \
--enable-fs=1 --fs-target-size=12 --fs-algo simple \
--complexity-ratio=50 \
- For the binarization I used the following configuration to generate 243 different binarizations.
{
"inputFeatures":[
{"name": "MarketCap", "bins":[2,3,5],"binarizer":"QuantileBinarizer"},
{"name": "BlockSize", "bins":[10],"binarizer":"QuantileBinarizer"},
{"name": "GoldPrice", "bins":[2,5,7],"binarizer":"QuantileBinarizer"},
{"name": "GTrends", "bins":[2,5,7],"binarizer":"QuantileBinarizer"},
{"name": "HashRate", "bins":[10],"binarizer":"QuantileBinarizer"},
{"name": "MempoolCount", "bins":[10],"binarizer": "QuantileBinarizer"},
{"name": "MempoolSize", "bins":[10],"binarizer": "QuantileBinarizer"},
{"name": "MinningDifficulty", "bins":[10],"binarizer": "QuantileBinarizer"},
{"name": "TransactionPerDay", "bins":[10],"binarizer": "QuantileBinarizer"},
{"name": "TransactionFee", "bins":[2,5,7],"binarizer": "QuantileBinarizer"},
{"name": "TransactionVolume_USD", "bins":[2,5,7], "binarizer": "QuantileBinarizer"}
],
"targetFeature": "Price",
"targetBinarization":"TopNvsOthers",
"LA_days": [2]
}
the bins key refer to the list of bins experimented with.
-
Using the MOSES data and parameter settings from the above two steps, I used
run_anal.sh
to run moses. The training and test size is fixed to 839,91 respectively. This data corresponds to 2016-2018 inclusively. The remaining data which accounts for 2019-2020 is kept as a virgin data for evaluating performance of the best models later. -
Following the above MOSES runs, the top N MOSES models based on training precision and accuracy are picked and stored in files which are found directly under
LargeExperiment/LA_*/
directories (LA referes to the number of look ahead days).rank_results.py
does exactly this job; taking root directory of the experiment and N value as arguments and savestop_N_prec_metrics.csv
andtop_N_acc_metrics.csv
files under eachLargeExperiment/LA_*/
directories. -
vote.py
does simple voting by ensembling the top N models. It takes the root directory wheretop_N_acc_metrics.csv
andtop_N_prec_metrics.csv
are found, N, and the thhreshold value for voting ((0,1]) as arguments to perform voting. It also valuates performance and accuracy on training and test inputs with the ensemble model and saves the result invoting_results.csv
file.