change corpus paths

fbkarsdorp · fbkarsdorp · commit 4fe5e6f2cf0b · 2014-07-14T20:48:29.000+02:00
diff --git a/Chapter 9 - Learning from Examples.ipynb b/Chapter 9 - Learning from Examples.ipynb
@@ -1,7 +1,7 @@
 {
  "metadata": {
   "name": "",
-  "signature": "sha256:a1ff8ec8ebc0312f22bc9c580940eafd0be4d19b198403b4e1f62ebfc5a9d4cf"
+  "signature": "sha256:9c4f189328f7bcdfb6a6712ea0a60f3d59f38125773549703574e6ecc776eb9e"
  },
  "nbformat": 3,
  "nbformat_minor": 0,
@@ -753,7 +753,7 @@
       "\n",
       "What kind of features can we use for authorship attribution? Words, defined a everything surrounded by spaces, are generally conceived as good features. The same holds for bigrams of words and character $n$-grams: [there ain't no such thing as a free lunch](https://en.wikipedia.org/wiki/No_free_lunch_theorem). Therefore let's not restrict ourselves to a single feature representation but experiment with a number of different representations and see what works best.\n",
       "\n",
-      "In the folder `supervized-learning/data/novels` you will find 26 famous British novels downloaded from [Project Gutenberg](http://www.gutenberg.org/wiki/Main_Page). This is a small toy dataset that we will use in our experiments. First we will create a simple representation of a document. I choose to represent each document as a tuple of an author, a title and the actual text. Instead of ordinary tuples we will use the `namedtuple` from the [collections](https://docs.python.org/3.4/library/collections.html#collections.namedtuple) module in Python's standard library. A namedtuple can be constructed as follows:"
+      "In the folder `data/british-novels` you will find 26 famous British novels downloaded from [Project Gutenberg](http://www.gutenberg.org/wiki/Main_Page). This is a small toy dataset that we will use in our experiments. First we will create a simple representation of a document. I choose to represent each document as a tuple of an author, a title and the actual text. Instead of ordinary tuples we will use the `namedtuple` from the [collections](https://docs.python.org/3.4/library/collections.html#collections.namedtuple) module in Python's standard library. A namedtuple can be constructed as follows:"
      ]
     },
     {
@@ -896,7 +896,7 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "We will write a function `make_document` that takes as argument a filename and returns an instance of our named tuple `Document`. Each filename in `supervized-learning/data/british-novels` consist of the author and the title separated by an underscore. This allows us to use the filenames to easily extract the title and author. The function `make_document` takes as argument a filename, an $n$-gram range, an argument that states whether to lowercase the text the type of $n$-grams (either word or char) and how large the sample of each text should be:"
+      "We will write a function `make_document` that takes as argument a filename and returns an instance of our named tuple `Document`. Each filename in `data/british-novels` consist of the author and the title separated by an underscore. This allows us to use the filenames to easily extract the title and author. The function `make_document` takes as argument a filename, an $n$-gram range, an argument that states whether to lowercase the text the type of $n$-grams (either word or char) and how large the sample of each text should be:"
      ]
     },
     {
@@ -1382,7 +1382,7 @@
      "input": [
       "from glob import glob\n",
       "\n",
-      "documents = [make_document(f) for f in glob('supervized-learning/data/british-novels/*.txt')]"
+      "documents = [make_document(f) for f in glob('data/british-novels/*.txt')]"
      ],
      "language": "python",
      "metadata": {},
@@ -1445,8 +1445,7 @@
       "scores = {}\n",
       "# insert your code here\n",
       "for sample in range(100, 5000, 500):\n",
-      "    documents = [make_document(f, sample=sample) for f in glob(\n",
-      "                    'supervized-learning/data/british-novels/*.txt')]\n",
+      "    documents = [make_document(f, sample=sample) for f in glob('data/british-novels/*.txt')]\n",
       "    authors, titles, texts = zip(*documents)\n",
       "    scores[sample] = cross_validate(AuthorshipLearner(), texts, authors, k=None, score_fn=f_score)"
      ],
@@ -1488,8 +1487,7 @@
       "scores = {}\n",
       "# insert your code here\n",
       "for n_most_frequent in range(50, 500, 100):\n",
-      "    documents = [make_document(f) for f in glob(\n",
-      "                    'supervized-learning/data/british-novels/*.txt')]\n",
+      "    documents = [make_document(f) for f in glob('data/british-novels/*.txt')]\n",
       "    authors, titles, texts = zip(*documents)\n",
       "    scores[n_most_frequent] = cross_validate(AuthorshipLearner(n_most_frequent=n_most_frequent), \n",
       "                                    texts, authors, k=None, score_fn=f_score)"
@@ -1594,8 +1592,7 @@
      "cell_type": "code",
      "collapsed": false,
      "input": [
-      "grid_search(AuthorshipLearner(), \n",
-      "            'supervized-learning/data/british-novels/', \n",
+      "grid_search(AuthorshipLearner(), 'data/british-novels/', \n",
       "            params=params, n_folds=None, score_fn=f_score, verbose=1)"
      ],
      "language": "python",