undate is a python library for working with uncertain or partially known dates.
This is pre-alpha software and is NOT feature complete! Use with caution. Currently it only supports parsing and formatting dates in ISO8601 format and some portions of EDTF (Extended Date Time Format).
Undate was initially created as part of a DH-Tech hackathon in November 2022.
Read Contributors for detailed contribution information.
Often humanities and cultural data include imprecise or uncertain temporal information. We want to store that information but also work with it in a structured way, not just treat it as text for display. Different projects may need to work with or convert between different date formats or even different calendars.
An undate.Undate
is analogous to python’s builtin datetime.date
object, but with support for varying degrees of precision and unknown information. You can initialize an undate with either strings or numbers for whichever parts of the date are known or partially known. An Undate
can take an optional label.
from undate.undate import Undate
november7 = Undate(2000, 11, 7)
november = Undate(2000, 11)
year2k = Undate(2000)
november7_some_year = Undate(month=11, day=7)
partially_known_year = Undate("19XX")
partially_known_month = Undate(2022, "1X")
easter1916 = Undate(1916, 4, 23, label="Easter 1916")
You can convert an Undate
to string using a date formatter (current default is ISO8601):
>>> [str(d) for d in [november7, november, year2k, november7_some_year]]
['2000-11-07', '2000-11', '2000', '--11-07']
If enough information is known, an Undate
object can report on its duration:
>>> december = Undate(2000, 12)
>>> feb_leapyear = Undate(2024, 2)
>>> feb_regularyear = Undate(2023, 2)
>>> for d in [november7, november, december, year2k, november7_some_year, feb_regularyear, feb_leapyear]:
... print(f"{d} - duration in days: {d.duration().days}")
2000-11-07 - duration in days: 1
2000-11 - duration in days: 30
2000-12 - duration in days: 31
2000 - duration in days: 366
--11-07 - duration in days: 1
2023-02 - duration in days: 28
2024-02 - duration in days: 29
If enough of the date is known and the precision supports it, you can check if one date falls within another date:
>>> november7 = Undate(2000, 11, 7)
>>> november2000 = Undate(2000, 11)
>>> year2k = Undate(2000)
>>> ad100 = Undate(100)
>>> november7 in november
>>> november2000 in year2k
>>> november7 in year2k
>>> november2000 in ad100
>>> november7 in ad100
For dates that are imprecise or partially known, undate
calculates earliest and latest possible dates for comparison purposes so you can sort dates and compare with equals, greater than, and less than. You can also compare with python datetime.date
>>> november7_2020 = Undate(2020, 11, 7)
>>> november_2001 = Undate(2001, 11)
>>> year2k = Undate(2000)
>>> ad100 = Undate(100)
>>> sorted([november7_2020, november_2001, year2k, ad100])
[<Undate 0100>, <Undate 2000>, <Undate 2001-11>, <Undate 2020-11-07>]
>>> november7_2020 > november_2001
>>> year2k < ad100
>>> from datetime import date
>>> year2k > date(2001, 1, 1)
When dates cannot be compared due to ambiguity or precision, comparison methods raise a NotImplementedError
>>> november_2020 = Undate(2020, 11)
>>> november7_2020 > november_2020
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/rkoeser/workarea/github/undate-python/src/undate/undate.py", line 262, in __gt__
return not (self < other or self == other)
File "/Users/rkoeser/workarea/github/undate-python/src/undate/undate.py", line 245, in __lt__
raise NotImplementedError(
NotImplementedError: Can't compare when one date falls within the other
An UndateInterval
is a date range between two Undate
objects. Intervals can be open-ended, allow for optional labels, and can calculate duration if enough information is known
>>> from undate.undate import UndateInterval
>>> UndateInterval(Undate(1900), Undate(2000))
<UndateInterval 1900/2000>
>>> UndateInterval(Undate(1900), Undate(2000), label="19th century")
>>> UndateInterval(Undate(1900), Undate(2000), label="19th century").duration().days
<UndateInterval '19th century' (1900/2000)>
>>> UndateInterval(Undate(1900), Undate(2000), label="20th century")
<UndateInterval '20th century' (1900/2000)>
>>> UndateInterval(latest=Undate(2000)) # before 2000
<UndateInterval ../2000>
>>> UndateInterval(Undate(1900)) # after 1900
<UndateInterval 1900/>
>>> UndateInterval(Undate(1900), Undate(2000), label="19th century").duration().days
>>> UndateInterval(Undate(2000, 1, 1), Undate(2000, 1,31)).duration().days
You can initialize Undate
or UndateInterval
objects by parsing a date string with a specific converter, and you can also output an Undate
object in those formats.
Currently available converters are "ISO8601" and "EDTF" and supported calendars.
>>> from undate import Undate
>>> Undate.parse("2002", "ISO8601")
<Undate 2002>
>>> Undate.parse("2002-05", "EDTF")
<Undate 2002-05>
>>> Undate.parse("--05-03", "ISO8601")
<Undate --05-03>
>>> Undate.parse("--05-03", "ISO8601").format("EDTF")
>>> Undate.parse("1800/1900")
<UndateInterval 1800/1900>
All Undate
objects are calendar aware, and date converters include support for parsing and working with dates from other calendars. The Gregorian calendar is used by default; currently undate
supports the Hijri Islamic calendar and the Anno Mundi Hebrew calendar based on calendar convertion logic implemented in the convertdatepackage.
Dates are stored with the year, month, day and appropriate precision for the original calendar; internally, earliest and latest dates are calculated in Gregorian / Proleptic Gregorian calendar for standardized comparison across dates from different calendars.
>>> from undate import Undate
>>> tammuz4816 = Undate.parse("26 Tammuz 4816", "Hebrew")
>>> tammuz4816
<Undate '26 Tammuz 4816 Anno Mundi' 4816-04-26 (Hebrew)>
>>> rajab495 = Undate.parse("Rajab 495", "Hijri")
>>> rajab495
<Undate 'Rajab 495 Hijrī' 0495-07 (Hijri)>
>>> y2k = Undate.parse("2001", "EDTF")
>>> y2k
<Undate 2001 (Gregorian)>
>>> [str(d.earliest) for d in [rajab495, tammuz4816, y2k]]
['1102-04-28', '1056-07-17', '2001-01-01']
>>> [str(d.precision) for d in [rajab495, tammuz4816, y2k]]
['MONTH', 'DAY', 'YEAR']
>>> sorted([rajab495, tammuz4816, y2k])
[<Undate '26 Tammuz 4816 Anno Mundi' 4816-04-26 (Hebrew)>, <Undate 'Rajab 495 Hijrī' 0495-07 (Hijri)>, <Undate 2001 (Gregorian)>]
For more examples, refer to the example notebooks included in this repository.
Project documentation is available on ReadTheDocs.
For instructions on setting up for local development, see Developer Notes.
This software is licensed under the Apache 2.0 License.