Skip to content

some performance improvement suggestions #220

Open
@romanchyla

Description

@romanchyla

memory optimization

given the memory problems we have detected, one possible optimization is to remove a document once it has been processed

for index in range(num_docs):

other optimizations

this part in here is quadratic, it is also making python list work extra hard by doing list.pop() -- cause python will have to reshuffle the lists

https://github.com/adsabs/export_service/blob/master/exportsrv/utils.py#L92

for better results:

1. turn the docs into a dict d
2. then do:
  for bibcode in bibcodes:
    if bibcode in d:
       new_docs.append(d.pop(bibcode))

this is another quadratic issue (and all of the similar)

in Python, a string is copied every time += is used -- which is problematic in here because export is building large textual output; so it gets more expensive with every added string

https://github.com/adsabs/export_service/blob/master/exportsrv/formatter/bibTexFormat.py#L262
https://github.com/adsabs/export_service/blob/master/exportsrv/formatter/bibTexFormat.py#L522

better to keep appending to a list; and then return ''.join(list)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions