Skip to content

Parsing dates prior to 1677-09-21 #19

@ch-sander

Description

@ch-sander

df[col] = pd.to_datetime(df[col], errors='coerce') # parse dates, invalid become NaT

Problem with pd.to_datetime() and historical dates before 1677

When converting date columns using:

df[col] = pd.to_datetime(df[col], errors='coerce')
df[col] = df[col].apply(lambda x: x.timestamp() if pd.notnull(x) else None)

we run into a limitation of pandas/numpy: the internal datetime64[ns] type only supports a timestamp range between 1677-09-21 and 2262-04-11. Dates outside this range (e.g. from the early modern or medieval period) are silently coerced to NaT or may raise exceptions depending on the version.

This makes pd.to_datetime() unsuitable for historical datasets with pre-1677 dates if used with the default datetime64[ns] conversion.


Suggestion

Use a Python-native parser such as dateutil.parser.parse() and retain standard datetime.datetime objects to avoid precision and range issues:

from dateutil.parser import parse

def safe_timestamp(val):
    try:
        return parse(val).timestamp()
    except Exception:
        return None

This handles negative timestamps (pre-1970) and very early dates correctly, provided downstream code can work with them.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions