-
Notifications
You must be signed in to change notification settings - Fork 0
runs are chained to very old ones, past the tsEnd date. #430
Comments
tldr: This is a big bug which has left a significant amount of data in Weird. Working backward from the master DB on sgdata: MariaDB [motus]> select distinct(runID) from hits where batchID=217213;
+-------+
| runID |
+-------+
| 264 |
| 261 |
| 262 |
| 263 |
| 259 |
| 265 |
+-------+
MariaDB [motus]> select distinct runID from batchRuns where batchID=217213;
+----------+
| runID |
+----------+
| 34202595 |
+----------+
We have lost referential integrity here: the only runID recorded Did this error come from the tag finder? Or did it arise in pushing We find the receiver to which this batch belongs: MariaDB [motus]> select motusDeviceID from batches where batchID=217213;
+---------------+
| motusDeviceID |
+---------------+
| 31 |
+---------------+ and look up its serno in the metadatadb: $ meta "select serno from recvDeps where deviceID=31 limit 1"
SG-1614BBBK1869 We examine this receiver's DB using sqlite> select * from motusTX where batchID + offsetBatchID = 217213;
batchID tsMotus offsetBatchID offsetRunID offsetHitID
---------- --------------- ------------- ----------- -----------
362 1547035579.2565 216851 0 854251524 The motusTX table records the offsets for batchIDs, runIDs, and hitIDs So, e.g. batch So somehow, offsetRunID was assigned a clearly bogus value. Looking at the code in motusServer/R/pushToMotus.R, we find this clause starting at line 186: ## get count of new runs and 1st run ID for this batch
runInfo = sql("select count(*), min(runID) from runs where batchIDbegin = %d", b$batchID)
if (runInfo[1,1] > 0) {
## reserve the required number of runIDs
firstMotusRunID = motusReserveKeys("runs", runInfo[1,1])
offsetRunID = firstMotusRunID - runInfo[1,2] However, for this specific batch, there are no runs that begin in the batch; only some runs sqlite> select count(*), min(runID) from runs where batchIDbegin=362;
count(*) min(runID)
---------- ----------
0 And so the res = dbSendQuery(con, sprintf("select * from hits where batchID = %d order by hitID", b$batchID))
repeat {
hits = dbFetch(res, CHUNK_ROWS)
if (nrow(hits) == 0)
break
hits$hitID = hits$hitID + offsetHitID
hits$runID = hits$runID + offsetRunID
hits$batchID = hits$batchID + offsetBatchID
dbWriteTable(mtcon, "hits", hits, append=TRUE, row.names=FALSE)
## copy the helper field tagDepProjectID from the value for the associated run Moreover, this reveals that the runID for all hits from continued We need to clean up this code to use the correct |
One approach is to create a map from local runIDs to master runIDs, populating the latter either from newly-reserved keys (for runs beginning in this batch), or from keys corrected by offsets (for runs begun in previous batches). |
The bigger job will be correcting the master database. Options:
Looks like 1 will be the better way to go. It can be coupled with a complete re-run of every receiver, |
We’ve been floating the idea of a full re-run once the major tag finder issues had been resolved. It’s something I would prefer to only do once if we can. I am not sure where we are on that.
Once the bug is fixed and prevents new runID problems for appearing, there seems to be only a limited number of cases that need to be resolved. Can we simply delete and re-run those batches only until we are ready for a full re-run?
Am I correct in assessing that this bug was only introduced in the last few weeks?
|
Nope - it's been in the code for years: $ cd /home/sg/src/motusServer/R
$ git blame pushToMotus.R
...
8691d441 package/R/pushToMotus.R (john brzustowski 2016-04-06 15:48:49 +0000 360)
hits$runID = hits$runID + offsetRunID
... |
…atch This is the bug fix for #430, but closing it requires other actions.
TODO
create temporary table resumedBatches as
select
t2.batchID
from
batches as t1
join batches as t2 on
t1.motusDeviceID = t2.motusDeviceID
and t1.batchID < t2.batchID
and t1.monoBN = t2.monoBN
and t1.tsStart < t2.tsStart
and t2.tsStart - t1.tsEnd < 1000 We can then run the search query as: create table hits_with_bad_runIDs as
select t2.hitID from
resumedBatches as t1
join hits as t2 on t1.batchID = t2.batchID
left join batchRuns as t3 on t2.batchID=t3.batchID and t2.runID=t3.runID
where t3.batchID is null There are 3284 resumed batches with a total of 32024675 hits. |
see #431 for a combined TO DO. |
At least a few recent batches contain runs that are marked as continuation of very old runs for tags that should be long dead.
This results in old tags showing up in detection timelines.
select c.batchID, b.runID, b.motusTagID, b.tsBegin, b.tsEnd, MIN(a.ts) as min_ts, MAX(a.ts) as max_ts from hits a inner join runs b on a.runID = b.runID
inner join batches c on a.batchID = c.batchID where a.ts > b.tsEnd and c.batchID > 217000
group by c.batchID, b.runID, motusTagID, b.tsBegin, b.tsEnd order by c.batchID
This was pointed out by Phil for this deployment.
https://motus.org/data/tagDeploymentDetections?id=8377
The tag involved (20442 ) appears in the table below.
This seems to have started only a few days ago. I'm running a query now to identify the earliest batchID, but that is a slow query. I'll post an update if the issue started earlier than I think.
The text was updated successfully, but these errors were encountered: