Description
memory optimization
given the memory problems we have detected, one possible optimization is to remove a document once it has been processed
other optimizations
this part in here is quadratic, it is also making python list work extra hard by doing list.pop()
-- cause python will have to reshuffle the lists
https://github.com/adsabs/export_service/blob/master/exportsrv/utils.py#L92
for better results:
1. turn the docs into a dict d
2. then do:
for bibcode in bibcodes:
if bibcode in d:
new_docs.append(d.pop(bibcode))
this is another quadratic issue (and all of the similar)
in Python, a string is copied every time +=
is used -- which is problematic in here because export is building large textual output; so it gets more expensive with every added string
https://github.com/adsabs/export_service/blob/master/exportsrv/formatter/bibTexFormat.py#L262
https://github.com/adsabs/export_service/blob/master/exportsrv/formatter/bibTexFormat.py#L522
better to keep appending to a list; and then return ''.join(list)