Home

This wiki details how to use each script to add new data to exploreapollo. Some basic background is outlined below regarding exploreapollo's storage system. To begin, start by setting up the configuration file then check the page for the appropriate type of data.

Requirements

Python 3

Python libraries - boto3, requests. These can be installed via pip.

ffmpeg - This is required for audio uploads, as Python's inbuilt audio functionality can fail for some wave formats. ffmpeg must be runnable from whichever shell you are using.

Background

Exploreapollo’s storage system consists of two entities - an Amazon S3 Cloud Storage unit and a SQL database. The S3 unit is used for storing raw data files. These files include audio, image, and text files. The database is responsible for organizing the data in these raw formats in a manner quickly accessible to the non-storage components of exploreapollo, such as the front-end. The database is additionally responsible for storing the logical associations between data, such as by storing stories and moments, and storing which media files are pertinent to each moment. As a result of this division of responsibility, data is stored in the system in one of three ways. First, the data may be stored in its entirety in both S3 and the database. This is the case for transcripts and metrics. Second, the data may be stored in its entirety in S3, with a URL reference to that data stored in the database. This occurs for files such as audio and images, which are too large to place in the database. Finally, associative data, such as moments and associations between the various types of information are stored only on the database. This is summarized in the table below.

Data type	Contents in S3	Contents in Database
Transcripts and Metrics	Text files containing data	Database enries for each item found in files
Audio, Images	Full audio and media files (.wav files, etc)	Database entries containing URL references to each resource in S3
Stories, Moments, Media Attachables	Nothing	Database entries

Each of the scripts provided execute one of three tasks, illustrated in the figure below. First, a script may upload data to both S3 and the database, either from a local machine (1) or an external source (1a). Second, a script may transfer data found in S3 over to the database(2). This is necessary if some other method is used to put data in only S3. Third, a script may upload data to just the database(3). This is for stories, moments, and media attachments only.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Requirements

Background

Clone this wiki locally