Skip to content
This repository has been archived by the owner on May 1, 2020. It is now read-only.

time-interleaved files with same monoBN causes earlier raw data records to be ignored #435

Open
jbrzusto opened this issue Jan 29, 2019 · 4 comments

Comments

@jbrzusto
Copy link
Owner

reified from MotusDev/Motus-TO-DO#434
Somewhat like #320 and #407.
In this case, there are detections in files from the original boot session 3, but because this is
a beaglebone white SG that was redeployed with a fresh SD card, and which had a bug whereby
boot numbers did not increase, there are several distinct boot sessions 3.
And unfortunately, there are files from later boot session 3 which have earlier pre-GPS timestamps
than some such files from an earlier boot session 3.
These later files are read early and bump the tag finder's clock forward before any of the
post-GPS timestamped files from the truly earlier boot session 3 can be processed. When
the latter are seen, their records are ignored because they contain time reversals.

This whole situation needs a rethink, as further elaborated in the issues linked above.

@jbrzusto
Copy link
Owner Author

This problem requires a dive into the deep end of sensorgnome / motus
design and implementation.

Here are notes that sketch out enough (hopefully) background to
guide a solution.

Data Flow

  • a sensorgnome (SG) writes pulse detection data to a sequence of files

  • an SG begins a new file every hour, or every megabyte of uncompressed
    data, whichever comes first; compressed and uncompressed files are written
    in tandem, with the uncompressed file deleted upon switching to a new file

  • filenames include the SG serial number, timestamp, and boot session count
    (the latter is supposed to increase by one each time the SG reboots, but
    this isn't always the case)

  • when users download files from an SG, they might get a partial copy of
    the last file (i.e. the file transfer process is not sync'd with file writing)

  • generally, batches of files from an SG reach the motus server in inreasing
    temporal order, but not always (sometimes, files are located later, as some
    SGs have more than one onboard storage location, which users are not always
    aware of; or apparently corrupt SD cards are later scanned for data)

  • pulses from data files must be run against a full database of active
    tags and their pulse patterns in order to assemble them into tag
    detections; a pulse is deemed to belong to at most one tag

  • the tag database exists only on the motus server

  • the interpretation of an individual pulse depends on context:

    • what pulses are nearby in time
    • what tags are known to be active at the time
  • the tag finder (find_tags_motus) uses a "greedy" approach to
    extract tag detections from pulse data in a single pass. ("greedy" means
    that the first confirmed tag detection sequence that is compatible
    with a pulse gets to claim it).

  • it's not feasible to re-run the tag finder on the entire pulse dataset for an
    SG every time we receive new data from it; this is especially true for networked
    receivers, from which we sync data hourly: the cumulative time spent processing data
    from each receiver would grow quadratically over time if we reprocessed from the
    beginning with each new batch of files.

  • instead, we split the sequence of files from an SG into time periods, and when new
    data arrive from an SG, we only re-run those time periods for which there are new files.

  • the time periods we chose are "boot sessions" (i.e. the maximal
    period of time during which a receiver ran without a reboot).

Here are the different ways the tag finder can be called to process some files:

  1. old files: all files from a boot session are re-run in temporal sequence.

  2. new files in a new boot session: when new files arrive, they are grouped by boot
    session, and files in each are processed in a single run of the tag finder (i.e. one run
    per boot session)

  3. new files in an existing boot session: as an optimization, the tag finder always saves
    its internal state at the end of a run, so that new files for an existing boot session can
    be processed incrementally. This is how we avoid quadratic growth in processing time.

So a single run of the tag finder handles files from a single boot session (and not necessarily
all of those files). This single run produces output called a batch, which consists of
individual tag detections (hits) grouped into runs (which are on the same antenna).

The problem: boot sessions aren't monotonic

The decision to use boot sessions to organized data was made when almost all SG data were
coming from beaglebone-black (BBBK) sensorgnomes, which have internal flash memory where we can
store the boot count. This works, but:

  • beaglebone-white (BBW) sensorgnomes (the original model, of which there are still maybe a dozen
    gathering data) and raspberry-pi sensorgnomes (most new SGs in the past couple of years) do
    not have this internal persistent storage, and as users run through different SD cards
    in the same unit, boot counts get reset or mixed up between receivers

  • there was a bug in incrementing the boot count (I know; pathetic; how do you fail to
    implement ++x?) in at least one version of SG software, even on BBBK SGs.

  • some users appear to have customized their SG's software in ways that mess with the boot count

So overall, the fact that N > M does not necessarily mean that a file (labelled as being) from
boot session 'N' was really written later than a file from boot session 'M'

The consequences of non-monotonic boot sessions

  • the first few files recorded by an SG after it boots often have incorrect timestamps: the
    SG boots thinking it is the year 2000, but real SG timestamps only begin in 2010 or later.
    Eventually, the GPS sets the system clock, and a correct timestamp is written, so the tagfinder
    uses this to back-correct those pre-2010 timestamps.

  • so if the system boots at different times but with the same boot number, there will be multiple
    files labelled with pre-GPS timestamps and the same boot numbers. One of these files eventually
    has a valid timestamp, and the tag finder will use that to back-correct the preceding timestamps.

The Catch

  • the tag finder isn't very smart about dealing with non-monotonic
    timestamps in pulse data. If it sees consecutive records where the
    clock appears to jump backward more than a few seconds (to allow for
    USB timing lag when reading from multiple radios on a single hub), it
    ignores the later records (with earlier timestamps). So when running
    files in the same nominal boot session which were written at different
    real times, a later post-GPS timestamp can cause huge amounts of data
    to be skipped in subsequent processing.

Possible ways forward

  • calculate monotonic boot numbers for each receiver; there is some code in the motusServer
    R package that does this, but hasn't been integrated into normal file processing

  • re-organize file processing around some other marker. e.g. every two-week period

    • this would be a good optimization for the frequently-required re-runs of data; when
      new or changed tag registrations need to be taken into account, we would only go
      back to those two-week periods affected by the change, and re-run them. (each period
      would save state, so we'd be doing a resume).

These aren't necessarily mutually exclusive.

@leberrigan
Copy link

Thanks for laying this out clearly. Do you have any further thoughts on moving forward? Should I assign this issue to somebody?

@jbrzusto
Copy link
Owner Author

Sorry, way behind on stuff. If someone else wants to take it on, great. It is a substantial chunk of
work, so best to coordinate efforts on it to avoid duplication.

@joeybernard
Copy link

I should be diving into this soon. Just dealing with a few other items first.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants