Improve augment request handling to batch requests rather than running them serially

Currently, master pipeline is receiving and processing augment pipeline requests serially, so that only one celery worker is handling requests on both augment and master pipelines.  We should also use the load-only argument to avoid loading and sending the fulltext field. 

Discussion from Slack (SMD+MT):

SMD: `I think we could easily speed up this process.  It looks like bibcodes are sent one at a time to augment.  this incurs the overhead of queueing a huge number of times.  If app.request_aff_augment could handle a list of bibcodes it could package up the list of requests into a list protobuf object: https://github.com/adsabs/ADSMasterPipeline/blob/41f874a33915b1f972b938316954849e3f2f1070/adsmp/app.py#L486
https://github.com/adsabs/ADSPipelineMsg/blob/master/specs/augmentrecord.proto#L15
app.request_aff_augment call to get_record should pass the optional load_only argument since it only needs bib data and fulltext is big.
If that doesn't help enough, we can request multiple database records at once.
We can also have run.py simply queue batches bibcodes and use workers to read data from postgres and send off the augment request.`

MT: `That makes sense according to what I saw on the container:
Without making use of the delay function in ADSAffil.tasks, the load was about 0.7, which sounds about right for single-threaded operation. 
With the delay function, load went up to about 2.2, which again makes sense if the receive, augment, and update queues are all running simultaneously. 
And it also makes sense that adjusting the number of workers within augment_pipeline makes no difference.`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve augment request handling to batch requests rather than running them serially #162

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve augment request handling to batch requests rather than running them serially #162

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions