Open
Description
Hi!
I have been playing around with python and the api library generated from (https://www.ncbi.nlm.nih.gov/datasets/docs/v2/openapi3/openapi3.docs.yaml)
and found a missing data typ or incorrect set data value when trying to get genome links. For accession number GCF_003957565.2. I get the following error:
ValidationError: 1 validation error for V2AssemblyLinksReplyAssemblyLink
assembly_link_type
Input should be 'GDV_LINK', 'FTP_LINK', 'ASSEMBLY_PUBMED', 'BLAST_LINK', 'ASSEMBLY_NUCCORE_REFSEQ' or 'ASSEMBLY_NUCCORE_GENBANK' [type=enum,
input_value='ASSEMBLY_NUCCORE', input_type=str]
For further information visit https://errors.pydantic.dev/2.10/v/enum
The assembly_link_type that generate the error is "ASSEMBLY_NUCCORE" which one can see when using curl to fetch the data
curl -X GET "https://api.ncbi.nlm.nih.gov/datasets/v2/genome/accession/GCF_003957565.2/links" \
-H 'accept: application/json'
Result
{
"assembly_links": [
{
"accession": "GCF_003957565.2",
"assembly_link_type": "BLAST_LINK",
"resource_link": "https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch&PROG_DEF=blastn&BLAST_SPEC=GDH_GCF_003957565.2"
},
{
"accession": "GCF_003957565.2",
"assembly_link_type": "GDV_LINK",
"resource_link": "https://www.ncbi.nlm.nih.gov/genome/gdv/browser/genome/?id=GCF_003957565.2"
},
{
"accession": "GCF_003957565.2",
"assembly_link_type": "FTP_LINK",
"resource_link": "https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/003/957/565/GCF_003957565.2_bTaeGut1.4.pri"
},
{
"accession": "GCF_003957565.2",
"assembly_link_type": "ASSEMBLY_PUBMED",
"resource_link": "https://www.ncbi.nlm.nih.gov/nuccore/?from_uid=10005361&linkname=assembly_pubmed"
},
{
"accession": "GCF_003957565.2",
"assembly_link_type": "ASSEMBLY_NUCCORE",
"resource_link": "https://www.ncbi.nlm.nih.gov/nuccore/?from_uid=10005361&linkname=assembly_nuccore_refseq"
},
{
"accession": "GCF_003957565.2",
"assembly_link_type": "ASSEMBLY_NUCCORE_REFSEQ",
"resource_link": "https://www.ncbi.nlm.nih.gov/nuccore/?from_uid=10005361&linkname=assembly_nuccore_refseq"
},
{
"accession": "GCF_003957565.2",
"assembly_link_type": "ASSEMBLY_NUCCORE_GENBANK",
"resource_link": "https://www.ncbi.nlm.nih.gov/nuccore/?from_uid=10005361&linkname=assembly_nuccore_insdc"
}
]
}
If we look at the yaml definition we can see "ASSEMBLY_NUCCORE" that isn't defined
datasets/datasets.openapi.yaml
Lines 11477 to 11485 in d511fd7
and the generate python code will not have it as a value
class V2AssemblyLinksReplyAssemblyLinkType(str, Enum):
"""
V2AssemblyLinksReplyAssemblyLinkType
"""
"""
allowed enum values
"""
GDV_LINK = 'GDV_LINK'
FTP_LINK = 'FTP_LINK'
ASSEMBLY_PUBMED = 'ASSEMBLY_PUBMED'
BLAST_LINK = 'BLAST_LINK'
ASSEMBLY_NUCCORE_REFSEQ = 'ASSEMBLY_NUCCORE_REFSEQ'
ASSEMBLY_NUCCORE_GENBANK = 'ASSEMBLY_NUCCORE_GENBANK'
@classmethod
def from_json(cls, json_str: str) -> Self:
"""Create an instance of V2AssemblyLinksReplyAssemblyLinkType from a JSON string"""
return cls(json.loads(json_str))