Skip to content

To Consider: Comprehensive Final Enhancements for Project Efficiency and Maintainability #32

@tmushayahama

Description

@tmushayahama

Some tasks to consider for the remaining time

  • Implement Elasticsearch Scrolling for Pagination
  • Add pagination for large datasets using Elasticsearch scrolling for @quemeb.
  • Consider creating endpoints like scrollAnnotations, ScrollSNPsByChromosome, and ScrollSnpsById.
  • Research and implement API security, possibly using an API Guard annotation.
  • Make the scrollId an optional parameter and extend the Snp class to return a scrollId.
  • I will explain below more detail
  • Automate Purge of Downloads Folder
  • Develop a cron job or equivalent to regularly clear the downloads folder.
  • Enhance Test Coverage
  • Ensure test coverage includes fields like VEP_refseq_PANTHER_GO_SLIM_cellular_component_list_id.
  • Add these values to your sample data to ensure comprehensive testing.
  • Dynamic Column Handling
  • Implement functionality to test variable column loading, allowing for the addition or removal of columns dynamically.
  • This will start from your schema generation code
  • API Documentation
  • Research and implement a tool equivalent to Swagger for documenting APIs, including descriptions, required parameters, and optional parameters.
  • Code Documentation
  • If time allows, enhance code documentation using docstrings.
  • Reference: https://testdriven.io/blog/documenting-python/
  • Something to consider, Standardize Coding Conventions
  • Ensure consistent naming conventions across the codebase.
  • Choose and enforce a standard naming convention (preferably snake_case for Python). sometimes it is
    GetSNPsByChromosome and sometimes it is search_by_chromosomes
  • Good Error Messages

Implementation flow idea Scrolling in Elasticsearch:

Scrolling in Elasticsearch allows you to retrieve large numbers of results from a query in multiple batches without the cost of deep pagination. It's suitable for processing large datasets that exceed typical pagination limits.

When a scroll query is initiated, Elasticsearch provides a scroll_id that you use to fetch the next batch of results. This scroll_id acts like a cursor pointing to a specific place in the dataset.

Making scrollId an Optional Parameter:

  • Modify the endpoint that triggers the scrolling query to accept a scrollId as an optional query parameter.
  • If a scrollId is provided, the API should continue fetching results from where the last batch ended.
  • If no scrollId is provided, the API should start a new scroll session and return the initial batch of results along with a new scrollId.

Extending the Snp Class:
Subclass the Snp class to include a property that can return a scrollId associated with a query session.

API and Code Adjustments:
Adjust the API's logic to manage the lifecycle of a scroll session, including the expiration of scrollIds after a certain time (typically 1 minute by default in Elasticsearch, but configurable).
Implement error handling for cases when an expired or invalid scrollId is received.

tagging @akshala @huaiyumi

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions