Skip to content
Satsuki Ueno edited this page Apr 17, 2017 · 6 revisions

This wiki details how to use each script to add new data to exploreapollo. Some basic background is outlined below regarding exploreapollo's storage system. To begin, start by setting up the configuration file then check the page for the appropriate type of data.

Requirements

Python 3

Python libraries - boto3, requests. These can be installed via pip.

ffmpeg - This is required for audio uploads, as Python's inbuilt audio functionality can fail for some wave formats. ffmpeg must be runnable from whichever shell you are using.

Background

Exploreapollo’s storage system consists of two entities - an Amazon S3 Cloud Storage unit and a SQL database. The S3 unit is used for storing raw data files. These files include audio, image, and text files. The database is responsible for organizing the data in these raw formats in a manner quickly accessible to the non-storage components of exploreapollo, such as the front-end. The database is additionally responsible for storing the logical associations between data, such as by storing stories and moments, and storing which media files are pertinent to each moment. As a result of this division of responsibility, data is stored in the system in one of three ways. First, the data may be stored in its entirety in both S3 and the database. This is the case for transcripts and metrics. Second, the data may be stored in its entirety in S3, with a URL reference to that data stored in the database. This occurs for files such as audio and images, which are too large to place in the database. Finally, associative data, such as moments and associations between the various types of information are stored only on the database. This is summarized in the table below.

Data type Contents in S3 Contents in Database
Transcripts and Metrics Text files containing data Database enries for each item found in files
Audio, Images Full audio and media files (.wav files, etc) Database entries containing URL references to each resource in S3
Stories, Moments, Media Attachables Nothing Database entries

Each of the scripts provided execute one of three tasks, illustrated in the figure below. First, a script may upload data to both S3 and the database, either from a local machine (1) or an external source (1a). Second, a script may transfer data found in S3 over to the database(2). This is necessary if some other method is used to put data in only S3. Third, a script may upload data to just the database(3). This is for stories, moments, and media attachments only.

Clone this wiki locally