-
Notifications
You must be signed in to change notification settings - Fork 5
/
README
70 lines (47 loc) · 1.74 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
This is a Perl port of scy's levitation. It reads MediaWiki dump files
revision by revision and writes a data stream to stdout suitable for
git fast-import.
The first 1000 pages of the german Wikipedia and all their revisions
(about 390000) can be dumped in about 15 min on relatively moderate
hardware.
Dependencies
------------
You need at least Perl 5.10. The Perl interpreter has to be compiled
with threads support.
You also need a working C compiler for the inline SHA1 C function.
Currently this _must_ be gcc 4.3 callable as 'gcc-4.3'. This will be
fixed soon.
You need the following modules and their dependencies from CPAN:
- Regexp::Common
- Inline
- JSON::XS
- Compress::Raw::Zlib
- Carp::Assert
- CDB_File
- XML::Bare >= 0.44
- Deep::Hash::Utils
Some Linux distributions will already have the first set.
Under Debian / Ubuntu the following command should set you:
sudo apt-get install libregexp-common-perl \
libinline-perl libjson-xs-perl \
libcompress-raw-zlib-perl libcarp-assert-perl
Usage
-----
First, initialize a git repository:
cd /tmp
mkdir blawiki
cd blawiki
git init
Then, "levitate". This is a three-step process:
cat /path/to/blawiki-dump.xml | /path/to/levitation-perl/step1.pl
LC_ALL=C sort rev-table.txt > rev-sorted.txt
/path/to/levitation-perl/step2.pl | /path/to/levitation-perl/gfi.pl
Alternatively, you can just change to an empty directory and call the
"levitate" helper script with a path to a dump as parameter (may be
7z, bz2, gz or xml):
mkdir /tmp/blawiki
cd /tmp/blawiki
/path/to/levitation-perl/levitate /path/to/blawiki-dump...
Lots of progress information is printed to standard error, so it may be
best to redirect that to a file.
Have fun.