Skip to content

Commit ff4deec

Browse files
committedDec 14, 2020
README
1 parent 350c069 commit ff4deec

File tree

5 files changed

+86
-1
lines changed

5 files changed

+86
-1
lines changed
 

‎LICENSE

100755100644
File mode changed.

‎README.md

100755100644
+86-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,86 @@
1-
# stackexchange-xml-to-csv
1+
# stackexchange-xml-to-csv
2+
3+
**stackexchange-xml-to-csv** is a CLI tool that allows you to convert [Stack Exchange data dumps](https://archive.org/download/stackexchange) from `XML` to `CSV` format, which is more suitable for importing to the different databases.
4+
5+
Table of contents.
6+
=================
7+
* [Getting started](#get_start)
8+
* [Download database dump](#download-dump)
9+
* [Extract archive(s)](#extract)
10+
* [stackexchange-xml-to-csv building](#stackexchange-xml-to-csv-build)
11+
* [XML to CSV Convertation](#xml-to-csv)
12+
* [RDBMS schema examples](#examples)
13+
* [PostgreSQL](#pg)
14+
* [MySQL](#mysql)
15+
* [License](#license)
16+
17+
18+
Getting started.
19+
================
20+
Before, ensure that you have a working [Go environment](https://golang.org/doc/install) with go version >= 1.14. Execute in the console `go version` command. It should display the current version of the compiler.
21+
22+
23+
1. Download database dump.
24+
==========================
25+
26+
Choose and download the [database dump](https://archive.org/download/stackexchange) that you are going to convert.
27+
28+
**Important: Stackoverflow dump stored in 8 separated 7z archives:**
29+
30+
* [stackoverflow.com-Badges.7z](https://archive.org/download/stackexchange/stackoverflow.com-Badges.7z)
31+
* [stackoverflow.com-Comments.7z](https://archive.org/download/stackexchange/stackoverflow.com-Comments.7z)
32+
* [stackoverflow.com-PostHistory.7z](https://archive.org/download/stackexchange/stackoverflow.com-PostHistory.7z)
33+
* [stackoverflow.com-PostLinks.7z](https://archive.org/download/stackexchange/stackoverflow.com-PostLinks.7z)
34+
* [stackoverflow.com-Posts.7z](https://archive.org/download/stackexchange/stackoverflow.com-Posts.7z)
35+
* [stackoverflow.com-Tags.7z](https://archive.org/download/stackexchange/stackoverflow.com-Tags.7z)
36+
* [stackoverflow.com-Users.7z](https://archive.org/download/stackexchange/stackoverflow.com-Users.7z)
37+
* [stackoverflow.com-Votes.7z](https://archive.org/download/stackexchange/stackoverflow.com-Votes.7z)
38+
39+
## 2. Extract archive(s).
40+
========================
41+
42+
Extract archive(s) content file(s) to the directory from where you will convert files using [7z](https://www.7-zip.org/) or another archiver.
43+
44+
Example with with [academia.stackexchange.com.7z](https://archive.org/download/stackexchange/academia.stackexchange.com.7z) dump:
45+
```shell
46+
$ mkdir xml csv
47+
$ 7z e academia.stackexchange.com.7z -oxml
48+
$ ls xml/
49+
Badges.xml Comments.xml PostHistory.xml PostLinks.xml Posts.xml Tags.xml Users.xml Votes.xml
50+
```
51+
52+
## 3. stackexchange-xml-to-csv building.
53+
===========================================
54+
55+
Clone & build `stackexchange-xml-to-csv` converter:
56+
57+
```shell
58+
$ git clone https://github.com/SkobelevIgor/stackexchange-xml-to-csv
59+
$ cd stackexchange-xml-to-csv/
60+
$ go build
61+
```
62+
63+
## 4. XML to CSV Convertation.
64+
=============================
65+
66+
Now you have `stackexchange-xml-to-csv` executable file. Let’s convert XML files:
67+
```
68+
./stackexchange-xml-to-csv -—source-path=../xml --store-to-dir=../csv
69+
```
70+
### List of possible flags:
71+
72+
* `source-path` (**Required**) Absolute or relative path to the directory with an XML file(s) or to the separate XML file.
73+
* `store-to-dir` (**Optional**) Absolute or relative path to the directory where to store result CSV files.
74+
* `skip-html-decoding` (**Optional**) Some of the files (e.g., Posts.xml) contain escaped HTML. By default, the converter will decode them. To disable this behavior, use this flag.
75+
76+
77+
Schema examples.
78+
================
79+
Here you can find examples of the schema for different databases:
80+
* [PostgreSQL](example/postgres_ddl.sql)
81+
82+
83+
License
84+
=======
85+
86+
[MIT License](LICENSE)
File renamed without changes.

‎go.mod

100755100644
File mode changed.

‎go.sum

100755100644
File mode changed.

0 commit comments

Comments
 (0)
Please sign in to comment.