time-interleaved files with same monoBN causes earlier raw data records to be ignored #435

jbrzusto · 2019-01-29T16:50:52Z

reified from MotusDev/Motus-TO-DO#434
Somewhat like #320 and #407.
In this case, there are detections in files from the original boot session 3, but because this is
a beaglebone white SG that was redeployed with a fresh SD card, and which had a bug whereby
boot numbers did not increase, there are several distinct boot sessions 3.
And unfortunately, there are files from later boot session 3 which have earlier pre-GPS timestamps
than some such files from an earlier boot session 3.
These later files are read early and bump the tag finder's clock forward before any of the
post-GPS timestamped files from the truly earlier boot session 3 can be processed. When
the latter are seen, their records are ignored because they contain time reversals.

This whole situation needs a rethink, as further elaborated in the issues linked above.

jbrzusto · 2019-03-15T01:29:17Z

This problem requires a dive into the deep end of sensorgnome / motus
design and implementation.

Here are notes that sketch out enough (hopefully) background to
guide a solution.

Data Flow

a sensorgnome (SG) writes pulse detection data to a sequence of files
an SG begins a new file every hour, or every megabyte of uncompressed
data, whichever comes first; compressed and uncompressed files are written
in tandem, with the uncompressed file deleted upon switching to a new file
filenames include the SG serial number, timestamp, and boot session count
(the latter is supposed to increase by one each time the SG reboots, but
this isn't always the case)
when users download files from an SG, they might get a partial copy of
the last file (i.e. the file transfer process is not sync'd with file writing)
generally, batches of files from an SG reach the motus server in inreasing
temporal order, but not always (sometimes, files are located later, as some
SGs have more than one onboard storage location, which users are not always
aware of; or apparently corrupt SD cards are later scanned for data)
pulses from data files must be run against a full database of active
tags and their pulse patterns in order to assemble them into tag
detections; a pulse is deemed to belong to at most one tag
the tag database exists only on the motus server
the interpretation of an individual pulse depends on context:
- what pulses are nearby in time
- what tags are known to be active at the time
the tag finder (find_tags_motus) uses a "greedy" approach to
extract tag detections from pulse data in a single pass. ("greedy" means
that the first confirmed tag detection sequence that is compatible
with a pulse gets to claim it).
it's not feasible to re-run the tag finder on the entire pulse dataset for an
SG every time we receive new data from it; this is especially true for networked
receivers, from which we sync data hourly: the cumulative time spent processing data
from each receiver would grow quadratically over time if we reprocessed from the
beginning with each new batch of files.
instead, we split the sequence of files from an SG into time periods, and when new
data arrive from an SG, we only re-run those time periods for which there are new files.
the time periods we chose are "boot sessions" (i.e. the maximal
period of time during which a receiver ran without a reboot).

Here are the different ways the tag finder can be called to process some files:

old files: all files from a boot session are re-run in temporal sequence.
new files in a new boot session: when new files arrive, they are grouped by boot
session, and files in each are processed in a single run of the tag finder (i.e. one run
per boot session)
new files in an existing boot session: as an optimization, the tag finder always saves
its internal state at the end of a run, so that new files for an existing boot session can
be processed incrementally. This is how we avoid quadratic growth in processing time.

So a single run of the tag finder handles files from a single boot session (and not necessarily
all of those files). This single run produces output called a batch, which consists of
individual tag detections (hits) grouped into runs (which are on the same antenna).

The problem: boot sessions aren't monotonic

The decision to use boot sessions to organized data was made when almost all SG data were
coming from beaglebone-black (BBBK) sensorgnomes, which have internal flash memory where we can
store the boot count. This works, but:

beaglebone-white (BBW) sensorgnomes (the original model, of which there are still maybe a dozen
gathering data) and raspberry-pi sensorgnomes (most new SGs in the past couple of years) do
not have this internal persistent storage, and as users run through different SD cards
in the same unit, boot counts get reset or mixed up between receivers
there was a bug in incrementing the boot count (I know; pathetic; how do you fail to
implement ++x?) in at least one version of SG software, even on BBBK SGs.
some users appear to have customized their SG's software in ways that mess with the boot count

So overall, the fact that N > M does not necessarily mean that a file (labelled as being) from
boot session 'N' was really written later than a file from boot session 'M'

The consequences of non-monotonic boot sessions

the first few files recorded by an SG after it boots often have incorrect timestamps: the
SG boots thinking it is the year 2000, but real SG timestamps only begin in 2010 or later.
Eventually, the GPS sets the system clock, and a correct timestamp is written, so the tagfinder
uses this to back-correct those pre-2010 timestamps.
so if the system boots at different times but with the same boot number, there will be multiple
files labelled with pre-GPS timestamps and the same boot numbers. One of these files eventually
has a valid timestamp, and the tag finder will use that to back-correct the preceding timestamps.

The Catch

the tag finder isn't very smart about dealing with non-monotonic
timestamps in pulse data. If it sees consecutive records where the
clock appears to jump backward more than a few seconds (to allow for
USB timing lag when reading from multiple radios on a single hub), it
ignores the later records (with earlier timestamps). So when running
files in the same nominal boot session which were written at different
real times, a later post-GPS timestamp can cause huge amounts of data
to be skipped in subsequent processing.

Possible ways forward

calculate monotonic boot numbers for each receiver; there is some code in the motusServer
R package that does this, but hasn't been integrated into normal file processing
re-organize file processing around some other marker. e.g. every two-week period
- this would be a good optimization for the frequently-required re-runs of data; when
  new or changed tag registrations need to be taken into account, we would only go
  back to those two-week periods affected by the change, and re-run them. (each period
  would save state, so we'd be doing a resume).

These aren't necessarily mutually exclusive.

leberrigan · 2019-04-14T17:19:34Z

Thanks for laying this out clearly. Do you have any further thoughts on moving forward? Should I assign this issue to somebody?

jbrzusto · 2019-04-17T16:49:47Z

Sorry, way behind on stuff. If someone else wants to take it on, great. It is a substantial chunk of
work, so best to coordinate efforts on it to avoid duplication.

joeybernard · 2019-04-24T01:28:42Z

I should be diving into this soon. Just dealing with a few other items first.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

time-interleaved files with same monoBN causes earlier raw data records to be ignored #435

time-interleaved files with same monoBN causes earlier raw data records to be ignored #435

jbrzusto commented Jan 29, 2019

jbrzusto commented Mar 15, 2019

leberrigan commented Apr 14, 2019

jbrzusto commented Apr 17, 2019

joeybernard commented Apr 24, 2019

time-interleaved files with same monoBN causes earlier raw data records to be ignored #435

time-interleaved files with same monoBN causes earlier raw data records to be ignored #435

Comments

jbrzusto commented Jan 29, 2019

jbrzusto commented Mar 15, 2019

Data Flow

The problem: boot sessions aren't monotonic

The consequences of non-monotonic boot sessions

Possible ways forward

leberrigan commented Apr 14, 2019

jbrzusto commented Apr 17, 2019

joeybernard commented Apr 24, 2019