-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Some tasks to consider for the remaining time
- Implement Elasticsearch Scrolling for Pagination
- Add pagination for large datasets using Elasticsearch scrolling for @quemeb.
- Consider creating endpoints like scrollAnnotations, ScrollSNPsByChromosome, and ScrollSnpsById.
- Research and implement API security, possibly using an API Guard annotation.
- Make the scrollId an optional parameter and extend the Snp class to return a scrollId.
- I will explain below more detail
- Automate Purge of Downloads Folder
- Develop a cron job or equivalent to regularly clear the downloads folder.
- Enhance Test Coverage
- Ensure test coverage includes fields like VEP_refseq_PANTHER_GO_SLIM_cellular_component_list_id.
- Add these values to your sample data to ensure comprehensive testing.
- Dynamic Column Handling
- Implement functionality to test variable column loading, allowing for the addition or removal of columns dynamically.
- This will start from your schema generation code
- API Documentation
- Research and implement a tool equivalent to Swagger for documenting APIs, including descriptions, required parameters, and optional parameters.
- Code Documentation
- If time allows, enhance code documentation using docstrings.
- Reference: https://testdriven.io/blog/documenting-python/
- Something to consider, Standardize Coding Conventions
- Ensure consistent naming conventions across the codebase.
- Choose and enforce a standard naming convention (preferably snake_case for Python). sometimes it is
GetSNPsByChromosome and sometimes it is search_by_chromosomes - Good Error Messages
Implementation flow idea Scrolling in Elasticsearch:
Scrolling in Elasticsearch allows you to retrieve large numbers of results from a query in multiple batches without the cost of deep pagination. It's suitable for processing large datasets that exceed typical pagination limits.
When a scroll query is initiated, Elasticsearch provides a scroll_id that you use to fetch the next batch of results. This scroll_id acts like a cursor pointing to a specific place in the dataset.
Making scrollId an Optional Parameter:
- Modify the endpoint that triggers the scrolling query to accept a scrollId as an optional query parameter.
- If a scrollId is provided, the API should continue fetching results from where the last batch ended.
- If no scrollId is provided, the API should start a new scroll session and return the initial batch of results along with a new scrollId.
Extending the Snp Class:
Subclass the Snp class to include a property that can return a scrollId associated with a query session.
API and Code Adjustments:
Adjust the API's logic to manage the lifecycle of a scroll session, including the expiration of scrollIds after a certain time (typically 1 minute by default in Elasticsearch, but configurable).
Implement error handling for cases when an expired or invalid scrollId is received.