Skip to content

found incorrect/missing assembly_link_type #436

Open
@Smeds

Description

@Smeds

Hi!

I have been playing around with python and the api library generated from (https://www.ncbi.nlm.nih.gov/datasets/docs/v2/openapi3/openapi3.docs.yaml)
and found a missing data typ or incorrect set data value when trying to get genome links. For accession number GCF_003957565.2. I get the following error:

ValidationError: 1 validation error for V2AssemblyLinksReplyAssemblyLink
assembly_link_type
  Input should be 'GDV_LINK', 'FTP_LINK', 'ASSEMBLY_PUBMED', 'BLAST_LINK', 'ASSEMBLY_NUCCORE_REFSEQ' or 'ASSEMBLY_NUCCORE_GENBANK' [type=enum, 
input_value='ASSEMBLY_NUCCORE', input_type=str]
    For further information visit https://errors.pydantic.dev/2.10/v/enum
 

The assembly_link_type that generate the error is "ASSEMBLY_NUCCORE" which one can see when using curl to fetch the data

curl -X GET "https://api.ncbi.nlm.nih.gov/datasets/v2/genome/accession/GCF_003957565.2/links" \
 -H 'accept: application/json' 

Result

{
  "assembly_links": [
    {
      "accession": "GCF_003957565.2",
      "assembly_link_type": "BLAST_LINK",
      "resource_link": "https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch&PROG_DEF=blastn&BLAST_SPEC=GDH_GCF_003957565.2"
    },
    {
      "accession": "GCF_003957565.2",
      "assembly_link_type": "GDV_LINK",
      "resource_link": "https://www.ncbi.nlm.nih.gov/genome/gdv/browser/genome/?id=GCF_003957565.2"
    },
    {
      "accession": "GCF_003957565.2",
      "assembly_link_type": "FTP_LINK",
      "resource_link": "https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/003/957/565/GCF_003957565.2_bTaeGut1.4.pri"
    },
    {
      "accession": "GCF_003957565.2",
      "assembly_link_type": "ASSEMBLY_PUBMED",
      "resource_link": "https://www.ncbi.nlm.nih.gov/nuccore/?from_uid=10005361&linkname=assembly_pubmed"
    },
    {
      "accession": "GCF_003957565.2",
      "assembly_link_type": "ASSEMBLY_NUCCORE",
      "resource_link": "https://www.ncbi.nlm.nih.gov/nuccore/?from_uid=10005361&linkname=assembly_nuccore_refseq"
    },
    {
      "accession": "GCF_003957565.2",
      "assembly_link_type": "ASSEMBLY_NUCCORE_REFSEQ",
      "resource_link": "https://www.ncbi.nlm.nih.gov/nuccore/?from_uid=10005361&linkname=assembly_nuccore_refseq"
    },
    {
      "accession": "GCF_003957565.2",
      "assembly_link_type": "ASSEMBLY_NUCCORE_GENBANK",
      "resource_link": "https://www.ncbi.nlm.nih.gov/nuccore/?from_uid=10005361&linkname=assembly_nuccore_insdc"
    }
  ]
}

If we look at the yaml definition we can see "ASSEMBLY_NUCCORE" that isn't defined

datasets/datasets.openapi.yaml

Lines 11477 to 11485 in d511fd7

v2AssemblyLinksReplyAssemblyLinkType:
type: string
enum:
- GDV_LINK
- FTP_LINK
- ASSEMBLY_PUBMED
- BLAST_LINK
- ASSEMBLY_NUCCORE_REFSEQ
- ASSEMBLY_NUCCORE_GENBANK

and the generate python code will not have it as a value

class V2AssemblyLinksReplyAssemblyLinkType(str, Enum):
    """
    V2AssemblyLinksReplyAssemblyLinkType
    """

    """
    allowed enum values
    """
    GDV_LINK = 'GDV_LINK'
    FTP_LINK = 'FTP_LINK'
    ASSEMBLY_PUBMED = 'ASSEMBLY_PUBMED'
    BLAST_LINK = 'BLAST_LINK'
    ASSEMBLY_NUCCORE_REFSEQ = 'ASSEMBLY_NUCCORE_REFSEQ'
    ASSEMBLY_NUCCORE_GENBANK = 'ASSEMBLY_NUCCORE_GENBANK'

    @classmethod
    def from_json(cls, json_str: str) -> Self:
        """Create an instance of V2AssemblyLinksReplyAssemblyLinkType from a JSON string"""
        return cls(json.loads(json_str))

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions