datasurvey

Crawl a directory of files and generate a summary of what is available. Builds on libmagic, and uses Magic database to identify files. Is therefore similar in function to file(1), except:

Configurable output reporters:
- CSV list of all files
- List of files with mime types from a 'bad match' list
- Fancy table output
- Aggregate reports showing number of each type of file
Recursion into archives such as Zip and RAR.

Installation

You can install Datasurvey on your system by running the following as root:

$ pip install -r requirements.txt
$ python setup.py install

Running with Docker

Sometimes you want to avoid running Datasurvey on bare metal, which involves installing dependencies and what not. Docker is the solution!

You can build a Datasurvey Docker image with:

$ docker build -t datasurvey /path/to/datasurvey

Then you can run it with:

$ docker run -t datasurvey

It'll drop you into a bash shell where you can run Datasurvey with:

# datasurvey [OPTIONS] PATH

The clever way to do this is to mount a volume you wish to use:

$ docker run -t datasurvey -v /path/to/data:/data:ro
# datasurvey [OPTIONS] /data

Known bugs

Zip and Rar files don't clearly indicate filename encoding; guessing is hard.
Encountering archives inside archives may cause crashes

TODO

Needs a lot more testing.
Support tar, tar.gz, tar.bz2, tar.z7, etc.

Contact

If there are any questions, create a Github issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

datasurvey

Installation

Running with Docker

Known bugs

TODO

Contact

Files

README.md

Latest commit

History

README.md

File metadata and controls

datasurvey

Installation

Running with Docker

Known bugs

TODO

Contact