-
Notifications
You must be signed in to change notification settings - Fork 0
Description
glyphspace/src/assets/processor.py
Line 82 in e263d9c
df[col] = pd.to_datetime(df[col], errors='coerce') # parse dates, invalid become NaT |
Problem with pd.to_datetime()
and historical dates before 1677
When converting date columns using:
df[col] = pd.to_datetime(df[col], errors='coerce')
df[col] = df[col].apply(lambda x: x.timestamp() if pd.notnull(x) else None)
we run into a limitation of pandas
/numpy
: the internal datetime64[ns]
type only supports a timestamp range between 1677-09-21 and 2262-04-11. Dates outside this range (e.g. from the early modern or medieval period) are silently coerced to NaT
or may raise exceptions depending on the version.
This makes pd.to_datetime()
unsuitable for historical datasets with pre-1677 dates if used with the default datetime64[ns]
conversion.
Suggestion
Use a Python-native parser such as dateutil.parser.parse()
and retain standard datetime.datetime
objects to avoid precision and range issues:
from dateutil.parser import parse
def safe_timestamp(val):
try:
return parse(val).timestamp()
except Exception:
return None
This handles negative timestamps (pre-1970) and very early dates correctly, provided downstream code can work with them.