|
473 | 473 | "source": [
|
474 | 474 | "In the Atari Space_Invaders environment, the agent learns to control a laser cannon to fire at descending aliens. The goal is to defeat all the aliens while avoiding to be destroyed by them.\n",
|
475 | 475 | "\n",
|
476 |
| - "* Observation has a shape of (210, 160, 3), which stands for 210 pixels long, 160 pixels wide, and RGB color chanels. Each chanel is an integer value from 0 to 255. For example, a black pixel is (0,0,0), a white pixel is (255, 255, 255).\n", |
| 476 | + "* Observation has a shape of (210, 160, 3), which stands for 210 pixels long, 160 pixels wide, and RGB color channels. Each chanel is an integer value from 0 to 255. For example, a black pixel is (0,0,0), a white pixel is (255, 255, 255).\n", |
477 | 477 | "* Reward is a scalar float value. Total rewards is the game score. No discount.\n",
|
478 | 478 | "* Action is a scalar integer with six possible values:\n",
|
479 | 479 | " * 0 — Stand still\n",
|
|
647 | 647 | "id": "S7W1VfUHqjgz"
|
648 | 648 | },
|
649 | 649 | "source": [
|
650 |
| - "CNN is widely used in image recognition. Here the game screen has a shape of (210, 160, 3), which stands for 210 pixels long, 160 pixels wide, and RGB color chanels. DQN downsamples the pixel to 84 x 84 and convert the RGB colors to grayscale. Then it stacks 4 frames together in order to tell the direction and velocity of moving objects. Therefore the input is 84x84x4." |
| 650 | + "CNN is widely used in image recognition. Here the game screen has a shape of (210, 160, 3), which stands for 210 pixels long, 160 pixels wide, and RGB color channels. DQN downsamples the pixel to 84 x 84 and convert the RGB colors to grayscale. Then it stacks 4 frames together in order to tell the direction and velocity of moving objects. Therefore the input is 84x84x4." |
651 | 651 | ]
|
652 | 652 | },
|
653 | 653 | {
|
|
688 | 688 | "num_actions = action_tensor_spec.maximum - action_tensor_spec.minimum + 1\n",
|
689 | 689 | "observation_spec = tensor_spec.from_spec(train_env.observation_spec()) # (84, 84, 4) four gray frames stacking \n",
|
690 | 690 | "\n",
|
691 |
| - "# proprocessing from uint8 color code between 0 and 255 to a float32 between 0 and 1.\n", |
| 691 | + "# preprocessing from uint8 color code between 0 and 255 to a float32 between 0 and 1.\n", |
692 | 692 | "layer0 = tf.keras.layers.Lambda(lambda obs: tf.cast(obs, np.float32) / 255.)\n",
|
693 | 693 | "layer1 = tf.keras.layers.Conv2D(filters=32, kernel_size=(8, 8), strides = (4, 4), activation='relu')\n",
|
694 | 694 | "layer2 = tf.keras.layers.Conv2D(filters=64, kernel_size=(4, 4), strides = (2, 2), activation='relu')\n",
|
|
893 | 893 | "1. driver -- explores the environment using a collect policy\n",
|
894 | 894 | "2. collect policy -- used by driver to interact with the environment\n",
|
895 | 895 | "3. observer -- receives trajectories or experiences from driver and save to replay buffer\n",
|
896 |
| - "4. agent -- randomly pull experience from replay buffer and use the exprience to train\n", |
| 896 | + "4. agent -- randomly pull experience from replay buffer and use the experience to train\n", |
897 | 897 | "\n"
|
898 | 898 | ]
|
899 | 899 | },
|
|
957 | 957 | "id": "2iY2lGo3EaSr"
|
958 | 958 | },
|
959 | 959 | "source": [
|
960 |
| - "For information purpusoe, sample a batch of trajectories from the replay buffer and examine how they look like.\n", |
| 960 | + "For information purposes, sample a batch of trajectories from the replay buffer and examine how they look like.\n", |
961 | 961 | "\n",
|
962 | 962 | "Below it samples 2 trajectories; each has three steps; each step contains a 84x84x4 observation."
|
963 | 963 | ]
|
|
0 commit comments