Improve Readability

ryanrussell · ryanrussell · commit 0a3bbaf1451c · 2022-05-30T15:32:38.000-05:00
Signed-off-by: Ryan Russell &lt;git@ryanrussell.org&gt;
diff --git a/backtest/hist_downloader.py b/backtest/hist_downloader.py
@@ -207,7 +207,7 @@ def run(args):
             pickle.dump(dict_all_stocks, f, pickle.HIGHEST_PROTOCOL)
         print(f'Fundamentals {field} downloaded')
 
-    # This is adatped from https://towardsdatascience.com/sentiment-analysis-of-stocks-from-financial-news-using-python-82ebdcefb638
+    # This is adapted from https://towardsdatascience.com/sentiment-analysis-of-stocks-from-financial-news-using-python-82ebdcefb638
     if args.sentiment:
         print('sentiment downloading .............')
         finwiz_url = 'https://finviz.com/quote.ashx?t='
@@ -246,7 +246,7 @@ def run(args):
                     # read the text from each tr tag into text
                     # get text from a only
                     text = x.a.get_text()
-                    # splite text in the td tag into a list
+                    # split text in the td tag into a list
                     date_scrape = x.td.text.split()
                     # if the length of 'date_scrape' is 1, load 'time' as the only element
                     if len(date_scrape) == 1:
diff --git a/backtest/ma_cross.py b/backtest/ma_cross.py
@@ -15,7 +15,7 @@
 To use pyfolio 0.9.2
 1. line 893 in pyfolio/timeseries.py from: valley = np.argmin(underwater)  # end of the period to valley = underwater.idxmin()   # end of the period
 2. line 133 and 137 in pyfolio/round_trips.py: groupby uses list not tuple. ['block_dir', 'block_time']
-3. line 77 in pyfolio/roud_trips.py: doesn't support agg(stats_dict) and rename_axis ==> rename
+3. line 77 in pyfolio/round_trips.py: doesn't support agg(stats_dict) and rename_axis ==> rename
         ss = round_trips.assign(ones=1).groupby('ones')[col].agg(list(stats_dict.values()))
         ss.columns = list(stats_dict.keys())
         stats_all = (ss.T.rename({1.0: 'All trades'}, axis='columns'))
diff --git a/backtest/volume_factor_alphalens.ipynb b/backtest/volume_factor_alphalens.ipynb
@@ -23,7 +23,7 @@
    "source": [
     "### Factor Analysis\n",
     "\n",
-    "The predictive power of volume on future returns has been evaluted in the Vector Autoregression notebook. Here we use Alphalens to assess it directly."
+    "The predictive power of volume on future returns has been evaluated in the Vector Autoregression notebook. Here we use Alphalens to assess it directly."
    ]
   },
   {
@@ -275,7 +275,7 @@
     "\n",
     "3. Information Analysis; function alphalens.tears.create_information_tear_sheet;\n",
     "\n",
-    "Fundamental law of investment defines $IR = IC \\cdot \\sqrt({breadth})$. We expect IC to be significantly different from zero. And it is not distorted by skewness. Therefore Alphalens also prvoides distribution and QQ plot.\n",
+    "Fundamental law of investment defines $IR = IC \\cdot \\sqrt({breadth})$. We expect IC to be significantly different from zero. And it is not distorted by skewness. Therefore Alphalens also provides distribution and QQ plot.\n",
     "\n",
     "4. Turnover Analysis; function alphalens.tears.create_turnover_tear_sheet \n",
     "\n",
@@ -828,7 +828,7 @@
     "\n",
     "Information coefficient (IC) measures correlations between factor values and forward returns. It is positive but not very significant. \n",
     "\n",
-    "On the other hand, mean return by quantile looks promising; so does the cumulative return by quantile. It suggests higher volumne leads to higher returns."
+    "On the other hand, mean return by quantile looks promising; so does the cumulative return by quantile. It suggests higher volume leads to higher returns."
    ]
   },
   {
diff --git a/eod/eod_run.py b/eod/eod_run.py
@@ -8,7 +8,7 @@
     -- USDT from treasury gov to misc
     -- Option Stats from CBOE to misc
     -- VIX Index from CBOE to stocks
-    -- VIX futurs from CBOE to futures
+    -- VIX futures from CBOE to futures
 ##################################################
 ## {License_info}
 ##################################################
@@ -17,7 +17,7 @@
 ## Credits: [{credit_list}]
 ## License: {license}
 ## Version: {mayor}.{minor}.{rel}
-## Mmaintainer: {maintainer}
+## Maintainer: {maintainer}
 ## Email: {contact_email}
 ## Status: {dev_status}
 ##################################################
@@ -170,15 +170,15 @@ def main(args):
             construct_inter_commodity_spreads()
             logging.info('-------- inter-commodity spread updated --------')
         except:
-            logging.error('-------- inter-commdity spread failed --------')
+            logging.error('-------- inter-commodity spread failed --------')
         time.sleep(3)
 
         try:
             logging.info('-------- Construct ICS generic --------')
             construct_inter_comdty_generic_hist_prices()
             logging.info('inter-commodity generic spread updated.')
         except:
-            logging.error('inter-commoidty generic spread failed.')
+            logging.error('inter-commodity generic spread failed.')
         time.sleep(3)
 
     if args.curve:
diff --git a/ml/a3c.ipynb b/ml/a3c.ipynb
@@ -38,7 +38,7 @@
         "\n",
         "[OpenAI Gym CartPole](https://gym.openai.com/envs/CartPole-v0/) has four states, cart position, cart speed, pole angle, and pole speed. The actions are either going left or right. The objective is to keep pole from falling. Every move that doesn't lead to a fall gets reward 1.\n",
         "\n",
-        "A3C (Asynchronous Advantage Actor-Critic) is the asynchronous version of A2C proposed by DeepMind. Each worker pulls model parameters from glboal network and then feed updated parameters back after local training. Actor learns policy network while critic learns value network, and\n",
+        "A3C (Asynchronous Advantage Actor-Critic) is the asynchronous version of A2C proposed by DeepMind. Each worker pulls model parameters from global network and then feed updated parameters back after local training. Actor learns policy network while critic learns value network, and\n",
         "\n",
         "$$\n",
         "\\begin{aligned}\n",
diff --git a/ml/american_option.ipynb b/ml/american_option.ipynb
@@ -42,7 +42,7 @@
         "\n",
         "2. Gym and Monte Carlo simulation: to create OpenAI gym environment.\n",
         "\n",
-        "3. TF-Agent and Optimal stopping policy: to training a reinforement learning agent and policy."
+        "3. TF-Agent and Optimal stopping policy: to training a reinforcement learning agent and policy."
       ]
     },
     {
diff --git a/ml/atari_space_invaders.ipynb b/ml/atari_space_invaders.ipynb
@@ -473,7 +473,7 @@
       "source": [
         "In the Atari Space_Invaders environment, the agent learns to control a laser cannon to fire at descending aliens. The goal is to defeat all the aliens while avoiding to be destroyed by them.\n",
         "\n",
-        "* Observation has a shape of (210, 160, 3), which stands for 210 pixels long, 160 pixels wide, and RGB color chanels. Each chanel is an integer value from 0 to 255. For example, a black pixel is (0,0,0), a white pixel is (255, 255, 255).\n",
+        "* Observation has a shape of (210, 160, 3), which stands for 210 pixels long, 160 pixels wide, and RGB color channels. Each chanel is an integer value from 0 to 255. For example, a black pixel is (0,0,0), a white pixel is (255, 255, 255).\n",
         "* Reward is a scalar float value. Total rewards is the game score. No discount.\n",
         "* Action is a scalar integer with six possible values:\n",
         "  * 0 — Stand still\n",
@@ -647,7 +647,7 @@
         "id": "S7W1VfUHqjgz"
       },
       "source": [
-        "CNN is widely used in image recognition. Here the game screen has a shape of (210, 160, 3), which stands for 210 pixels long, 160 pixels wide, and RGB color chanels. DQN downsamples the pixel to 84 x 84 and convert the RGB colors to grayscale. Then it stacks 4 frames together in order to tell the direction and velocity of moving objects. Therefore the input is 84x84x4."
+        "CNN is widely used in image recognition. Here the game screen has a shape of (210, 160, 3), which stands for 210 pixels long, 160 pixels wide, and RGB color channels. DQN downsamples the pixel to 84 x 84 and convert the RGB colors to grayscale. Then it stacks 4 frames together in order to tell the direction and velocity of moving objects. Therefore the input is 84x84x4."
       ]
     },
     {
@@ -688,7 +688,7 @@
         "num_actions = action_tensor_spec.maximum - action_tensor_spec.minimum + 1\n",
         "observation_spec = tensor_spec.from_spec(train_env.observation_spec())    # (84, 84, 4) four gray frames stacking       \n",
         "\n",
-        "# proprocessing from uint8 color code between 0 and 255 to a float32 between 0 and 1.\n",
+        "# preprocessing from uint8 color code between 0 and 255 to a float32 between 0 and 1.\n",
         "layer0 = tf.keras.layers.Lambda(lambda obs: tf.cast(obs, np.float32) / 255.)\n",
         "layer1 = tf.keras.layers.Conv2D(filters=32, kernel_size=(8, 8), strides = (4, 4), activation='relu')\n",
         "layer2 = tf.keras.layers.Conv2D(filters=64, kernel_size=(4, 4), strides = (2, 2), activation='relu')\n",
@@ -893,7 +893,7 @@
         "1. driver -- explores the environment using a collect policy\n",
         "2. collect policy -- used by driver to interact with the environment\n",
         "3. observer -- receives trajectories or experiences from driver and save to replay buffer\n",
-        "4. agent -- randomly pull experience from replay buffer and use the exprience to train\n",
+        "4. agent -- randomly pull experience from replay buffer and use the experience to train\n",
         "\n"
       ]
     },
@@ -957,7 +957,7 @@
         "id": "2iY2lGo3EaSr"
       },
       "source": [
-        "For information purpusoe, sample a batch of trajectories from the replay buffer and examine how they look like.\n",
+        "For information purposes, sample a batch of trajectories from the replay buffer and examine how they look like.\n",
         "\n",
         "Below it samples 2 trajectories; each has three steps; each step contains a 84x84x4 observation."
       ]
diff --git a/ml/dqn.ipynb b/ml/dqn.ipynb
@@ -38,7 +38,7 @@
         "\n",
         "[OpenAI Gym CartPole](https://gym.openai.com/envs/CartPole-v0/) has four states, cart position, cart speed, pole angle, and pole speed. The actions are either going left or right. The objective is to keep pole from falling. Every move that doesn't lead to a fall gets reward 1.\n",
         "\n",
-        "DQN is off policy, value based, TD method to solve control problems. In 2015, DeepMind used DQN to play 50 Atari games and was able to achieve human-levle performance. It uses a DNN to model the q-value functions whose loss function is\n",
+        "DQN is off policy, value based, TD method to solve control problems. In 2015, DeepMind used DQN to play 50 Atari games and was able to achieve human-level performance. It uses a DNN to model the q-value functions whose loss function is\n",
         "\n",
         "$$\n",
         "L = \\left( r+\\gamma \\underset{a' \\in A}{\\max}Q(s',a') - Q(s,a) \\right)^2  \\tag{1}\n",
@@ -233,7 +233,7 @@
         "for n_epi in range(episodes):\n",
         "    # epsilon decay\n",
         "    epsilon = max(0.01, 0.08 - 0.01 * (n_epi / 200))\n",
-        "    s = env.reset()       # s is intial obs after reset\n",
+        "    s = env.reset()       # s is initial obs after reset\n",
         "    for t in range(600):  # run one episode\n",
         "        a = q.get_action(s, epsilon)\n",
         "        s_prime, r, done, info = env.step(a)\n",
@@ -244,7 +244,7 @@
         "        if done:\n",
         "            break\n",
         "\n",
-        "    if memory.size() > 2000:  # train as soon as there are 2000 expriences\n",
+        "    if memory.size() > 2000:  # train as soon as there are 2000 experiences\n",
         "        train(q, q_target, memory, optimizer)\n",
         "\n",
         "    if n_epi % print_interval == 0 and n_epi != 0:\n",
diff --git a/ml/mc_control.ipynb b/ml/mc_control.ipynb
@@ -154,8 +154,8 @@
         "    # Update all (state, action) pairs we've visited in this episode\n",
         "    sa_in_episode = set([(x[0], x[1]) for x in s_a_r_episode])\n",
         "    for state, action in sa_in_episode:\n",
-        "        first_occurence_idx = next(i for i,x in enumerate(s_a_r_episode) if x[0] == state and x[1] == action)\n",
-        "        G = sum([x[2]*(gamma**i) for i,x in enumerate(s_a_r_episode[first_occurence_idx:])])\n",
+        "        first_occurrence_idx = next(i for i,x in enumerate(s_a_r_episode) if x[0] == state and x[1] == action)\n",
+        "        G = sum([x[2]*(gamma**i) for i,x in enumerate(s_a_r_episode[first_occurrence_idx:])])\n",
         "        N[state, action] += 1.0\n",
         "        alpha = 1.0 / N[state, action]\n",
         "        # update q values\n",
diff --git a/ml/ppo.ipynb b/ml/ppo.ipynb
@@ -266,7 +266,7 @@
         "returns = [] # average returns over 20 episodes\n",
         "total = 0 # total returns\n",
         "for i_epoch in range(episodes):\n",
-        "    state = env.reset() # reset to intial state\n",
+        "    state = env.reset() # reset to initial state\n",
         "    for t in range(500): # maximum 500 steps\n",
         "        action, action_prob = agent.select_action(state)\n",
         "        next_state, reward, done, _ = env.step(action)\n",
diff --git a/ml/reinforce.ipynb b/ml/reinforce.ipynb
@@ -64,7 +64,7 @@
         "\n",
         "Below it uses the former and code structure follows closely [here](https://github.com/dragen1860/Deep-Learning-with-TensorFlow-book/blob/master/ch14-%E5%BC%BA%E5%8C%96%E5%AD%A6%E4%B9%A0/REINFORCE_tf.py).\n",
         "\n",
-        "REINFORCE is usually adjusted by a basline to reduce variance.\n",
+        "REINFORCE is usually adjusted by a baseline to reduce variance.\n",
         "\n",
         "\n",
         "__Reference__\n",
diff --git a/ml/reinforcement_trader.ipynb b/ml/reinforcement_trader.ipynb
@@ -16,13 +16,13 @@
         "id": "escS5QYOGuOO"
       },
       "source": [
-        "(Use Open in Colab button above to see trading vidoes)\n",
+        "(Use Open in Colab button above to see trading videos)\n",
         "\n",
         "## Introduction\n",
         "\n",
         "From reinforcement gamer to reinforcement trader. Part II.\n",
         "\n",
-        "For reinforcement gamer, Part I, check out [the previous notebook](https://github.com/letianzj/QuantResearch/blob/master/ml/atari_space_invaders.ipynb). This noteboook shares a lot of resemblance to the previous one.\n",
+        "For reinforcement gamer, Part I, check out [the previous notebook](https://github.com/letianzj/QuantResearch/blob/master/ml/atari_space_invaders.ipynb). This notebook shares a lot of resemblance to the previous one.\n",
         "\n",
         "As illustrated in the figure below, investing bears a clear resemblance to game playing. In fact, some good poke players, such as Edward Thorp, also stand out in the stock markets.\n",
         "\n",
diff --git a/notebooks/bayesian_linear_regression.py b/notebooks/bayesian_linear_regression.py
@@ -40,7 +40,7 @@
     sigma_beta_0 = sigma_beta_1
     beta_recorder.append(beta_0)
 
-print('pamameters: %.7f, %.7f' %(beta_0[0], beta_0[1]))
+print('parameters: %.7f, %.7f' %(beta_0[0], beta_0[1]))
 
 # plot the Beyesian dynamics
 xfit = np.linspace(0, 10, sample_size)
diff --git a/notebooks/ch1_pca_relative_value.ipynb b/notebooks/ch1_pca_relative_value.ipynb
@@ -50,7 +50,7 @@
         "There are three basic movements in yield curve: \n",
         "1. level or a parallel shift; \n",
         "2. slope, i.e., a flattening or steepening; and \n",
-        "3. curvature, i.e., hump or butterlfy.\n",
+        "3. curvature, i.e., hump or butterfly.\n",
         "\n",
         "PCA formalizes this viewpoint.\n",
         "\n",
@@ -1232,7 +1232,7 @@
         "id": "0TapnAwfi-nz"
       },
       "source": [
-        "SVD has p singluar values; covariance matrix is pxp. $W^T$ is pca.components_, which is pxp"
+        "SVD has p singular values; covariance matrix is pxp. $W^T$ is pca.components_, which is pxp"
       ]
     },
     {
@@ -1523,7 +1523,7 @@
         "\n",
         "Second PC is spread. It suggests that short tenors move downward while long tenors move upward, or steepening. \n",
         "\n",
-        "Third PC is buterfly or curvature. The belly rises 40bps while the wings fall 40bps."
+        "Third PC is butterfly or curvature. The belly rises 40bps while the wings fall 40bps."
       ]
     },
     {
diff --git a/notebooks/classical_linear_regression.py b/notebooks/classical_linear_regression.py
@@ -16,13 +16,13 @@
 # normal equation to estimate the model parameters
 X = np.vstack((np.ones(sample_size), x)).T
 params_closed_form = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)
-print('pamameters: %.7f, %.7f' %(params_closed_form[0], params_closed_form[1]))
+print('parameters: %.7f, %.7f' %(params_closed_form[0], params_closed_form[1]))
 
 from sklearn.linear_model import LinearRegression
 # The next two lines does the regression
 lm_model = LinearRegression(copy_X=True, fit_intercept=True, normalize=False)
 lm_model.fit(x.reshape(-1,1), y)        # fit() expects 2D array
-print('pamameters: %.7f, %.7f' %(lm_model.intercept_, lm_model.coef_))
+print('parameters: %.7f, %.7f' %(lm_model.intercept_, lm_model.coef_))
 
 # present the graph
 xfit = np.linspace(0, 10, sample_size)
diff --git a/notebooks/cointegration_pairs_trading.py b/notebooks/cointegration_pairs_trading.py
@@ -25,7 +25,7 @@
 # The next two lines does the regression
 lm_model = LinearRegression(copy_X=True, fit_intercept=True, normalize=False)
 lm_model.fit(data['EWA US Equity'].values.reshape(-1,1), data['EWC US Equity'].values)        # fit() expects 2D array
-print('pamameters: %.7f, %.7f' %(lm_model.intercept_, lm_model.coef_))
+print('parameters: %.7f, %.7f' %(lm_model.intercept_, lm_model.coef_))
 
 # present the graph
 fig, ax = plt.subplots(nrows=1, ncols=2)
@@ -53,7 +53,7 @@
 
 lm_model = LinearRegression(copy_X=True, fit_intercept=True, normalize=False)
 lm_model.fit(data['EWC US Equity'].values.reshape(-1,1), data['EWA US Equity'].values)        # fit() expects 2D array
-print('pamameters: %.7f, %.7f' %(lm_model.intercept_, lm_model.coef_))
+print('parameters: %.7f, %.7f' %(lm_model.intercept_, lm_model.coef_))
 yfit = lm_model.coef_ * data['EWC US Equity'] + lm_model.intercept_
 y_residual = data['EWA US Equity'] - yfit
 ts.adfuller(y_residual, 1)           # lag = 1
diff --git a/notebooks/hidden_markov_chain.py b/notebooks/hidden_markov_chain.py
@@ -9,7 +9,7 @@
 from datetime import datetime, date
 from hmmlearn.hmm import GaussianHMM
 
-#################################################### Viberti #####################################################
+#################################################### Viterbi #####################################################
 # https://en.wikipedia.org/wiki/Viterbi_algorithm
 obs = ('happy', 'happy', 'happy')
 states = ('Up', 'Down')
diff --git a/notebooks/mcmc_linear_regression.py b/notebooks/mcmc_linear_regression.py
@@ -16,7 +16,7 @@
 # don't forget to generate the 500 random samples as in the previous post
 sigma_e = 3.0
 
-# Similar to last post, let's initially believe that a, b follow Normal distribution with mean 0.5 and stadndard deviation 0.5
+# Similar to last post, let's initially believe that a, b follow Normal distribution with mean 0.5 and standard deviation 0.5
 # it returns the probability of seeing beta under this belief
 def prior_probability(beta):
     a = beta[0]     # intercept
diff --git a/notebooks/pairs_trading_kalman_filter.py b/notebooks/pairs_trading_kalman_filter.py
@@ -42,7 +42,7 @@
 obs_mat_F = np.transpose(np.vstack([data[sym_a].values, np.ones(data.shape[0])])).reshape(-1, 1, 2)
 
 kf = KalmanFilter(n_dim_obs=1,                                      # y is 1-dimensional
-                  n_dim_state=2,                                    #  states (alpha, beta) is 2-dimensinal
+                  n_dim_state=2,                                    #  states (alpha, beta) is 2-dimensional
                   initial_state_mean=np.ones(2),                    #  initial value of intercept and slope theta0|0
                   initial_state_covariance=np.ones((2, 2)),         # initial cov matrix between intercept and slope P0|0
                   transition_matrices=np.eye(2),                    # G, constant
diff --git a/notebooks/portfolio_management_one.py b/notebooks/portfolio_management_one.py
@@ -106,7 +106,7 @@
 ef_right = np.asscalar(max(hist_mean.as_matrix()))         # maximum return
 target_returns = np.linspace(ef_left, ef_right, N)         # N target returns
 optimal_weights = [ solvers.qp(P, q, A=A, b=matrix([t,1]))['x'] for t in target_returns ]    # QP solver
-ef_returns = [ np.asscalar(np.dot(w.T, hist_mean.as_matrix())*250) for w in optimal_weights ]         #a nnualized
+ef_returns = [ np.asscalar(np.dot(w.T, hist_mean.as_matrix())*250) for w in optimal_weights ]         # annualized
 ef_risks = [ np.asscalar(np.sqrt(np.dot(w.T, np.dot(hist_cov, w)) * 250)) for w in optimal_weights ]
 
 plt.plot(port_stdevs, port_returns, 'o', markersize=6, label='Candidate Market Portfolio')
diff --git a/notebooks/python.ipynb b/notebooks/python.ipynb
@@ -733,7 +733,7 @@
    "source": [
     "### Grouping\n",
     "\n",
-    "groupby: split/appply/combine"
+    "groupby: split/apply/combine"
    ]
   },
   {
diff --git a/notebooks/tensorflow_linear_regression.ipynb b/notebooks/tensorflow_linear_regression.ipynb
@@ -100,7 +100,7 @@
         "colab_type": "text"
       },
       "source": [
-        "### 1. Low-Level Implementattion\n",
+        "### 1. Low-Level Implementation\n",
         "\n",
         "The convergence depends on initial states of w and b, batch_size, learning rate, and loss function (mse or rmse), etc."
       ]
diff --git a/notebooks/vector_autoregression.ipynb b/notebooks/vector_autoregression.ipynb
diff --git a/notebooks/volume_factor_alphalens.ipynb b/notebooks/volume_factor_alphalens.ipynb
diff --git a/report/eia_crude.py b/report/eia_crude.py
diff --git a/utils/data_loader.py b/utils/data_loader.py
diff --git a/utils/futures_tools.py b/utils/futures_tools.py

Original file line number	Diff line number	Diff line change
`@@ -42,7 +42,7 @@`
`42`	`42`	`"\n",`
`43`	`43`	`"2. Gym and Monte Carlo simulation: to create OpenAI gym environment.\n",`
`44`	`44`	`"\n",`
`45`		`- "3. TF-Agent and Optimal stopping policy: to training a reinforement learning agent and policy."`
	`45`	`+ "3. TF-Agent and Optimal stopping policy: to training a reinforcement learning agent and policy."`
`46`	`46`	`]`
`47`	`47`	`},`
`48`	`48`	`{`
Original file line number	Diff line number	Diff line change
`@@ -733,7 +733,7 @@`
`733`	`733`	`"source": [`
`734`	`734`	`"### Grouping\n",`
`735`	`735`	`"\n",`
`736`		`- "groupby: split/appply/combine"`
	`736`	`+ "groupby: split/apply/combine"`
`737`	`737`	`]`
`738`	`738`	`},`
`739`	`739`	`{`
Original file line number	Diff line number	Diff line change
`@@ -100,7 +100,7 @@`
`100`	`100`	`"colab_type": "text"`
`101`	`101`	`},`
`102`	`102`	`"source": [`
`103`		`- "### 1. Low-Level Implementattion\n",`
	`103`	`+ "### 1. Low-Level Implementation\n",`
`104`	`104`	`"\n",`
`105`	`105`	`"The convergence depends on initial states of w and b, batch_size, learning rate, and loss function (mse or rmse), etc."`
`106`	`106`	`]`