Skip to content

Architecture

Daniel Schultz edited this page Aug 25, 2017 · 1 revision

architecture

This is a first stab at representing the components of the ContextuBot. It is high level, but it attempts to carve out the idea of having a few separate services within the public facing API.

Some notes to go along with the diagram as it stands:

  1. Input sources are intended here as examples. We will start with just one (direct upload) and will likely build out Twitter or Reddit as a second.

  2. The Social Media Daemon will not be built in the initial MVP, but is intended to be a bot that watches a particular medium for video clips and automatically ingests them into the system.

  3. "Video extraction" for the MVP will simply mean file download by URL, however it might be useful to think about this as an abstract function which will eventually need to be able to retrieve video from various types of servers (e.g. YouTube or Twitter)

  1. The video is passed to the contextualization service, which currently has only one source of context: the audio fingerprinting service. What is not captured by this diagram is what happens within the fingerprinting service to result in matches. Ultimately the "contextualization service" should be provided a list of video hits that it uses to generate a context.json file

  2. The Internet Archive is not going to be providing us with video files and probably will not provide us with audio files either. However, they will be able to provide us audio fingerprint files associated with an A/Vitem. (Side note: this is part of why I would like to make sure we build the fingerprinting service as something completely decoupled from the contextualization service, so that the archive can install it as well and replace the Duplitron 5k (I propose we call our fingerprinting service the Duplitron 6k)))

  3. The Cache Killer is intended to be a service that applies some heuristic to re-run the contextualization service on certain videos in order to keep the context up to date. Ideally we would archive past context.json files for a given video.

  4. The process will take time, and so there must be support for polling against the context.json service and media files with context records that are actively being generated will need the status of that generation stored in the context.json service. This will alow our context view to reflect the current state.

Clone this wiki locally