Skip to content

Filling a Publciation fails for unexisting publication dates (e.g. "2023/4/31" leads to ValueError: "day is out of range for month") #562

Open
@tZimmermann98

Description

@tZimmermann98

Describe the bug
When trying to fill a Publication that has an unexisting publication date (for example: '2023/4/31'), it fails with a Value Error.

To Reproduce

search_query = scholarly.search_pubs('Where Technology and Content Fuse: Applying Technology Acceptance to the Usage of and Payment for Digital Journalism')
scholarly.pprint(next(search_query))

Expected behavior
Since only the year of the publication is stored as ['bib']['pub_year'] I would expect to ignore errors with Months or Days and just fallback to the year or leave ['bib']['pub_year'] as NA.

Desktop (please complete the following information):

  • Proxy service: None
  • python version: 3.11
  • OS: macOS 15.2
  • Scholarly Version: 1.7.11 / Latest

Do you plan on contributing?

  • Yes, I will create a Pull Request with the bugfix.
    My suggestion would be to adapt PublicationParser.fill with:
elif key == 'publication date':

    patterns = ['YYYY/M',
                'YYYY/MM/DD',
                'YYYY',
                'YYYY/M/DD',
                'YYYY/M/D',
                'YYYY/MM/D']
    try:
        publication['bib']['pub_year'] = arrow.get(val.text, patterns).year
    except ValueError:
        # fallback to regex year extraction
        publication['bib']['pub_year'] = re.search(r'\d{4}', val.text).group()

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions