-
Notifications
You must be signed in to change notification settings - Fork 52
Open
Description
In many Microsoft apps, the equivalent of a null timestamp is 1601/01/01 00:00:00. These can pop up sometimes if a file has been recovered or there was some other issue during saving.
This causes an OSError when trying to convert to a Unix timestamp: datetime(1601, 1, 1, 0, 0, 0).timestamp(),
Suggested fix:
def safe_timestamp(dt: datetime) -> Optional[str]:
"""
Converts a datetime object to a string representation of its timestamp.
Handles potential exceptions that may arise during conversion.
"""
try:
return str(dt.timestamp())
except (OSError, ValueError, OverflowError):
return NoneIn the OneDrive ingester:
unstructured-ingest/unstructured_ingest/processes/connectors/onedrive.py
Lines 208 to 225 in 1234157
| return FileData( | |
| identifier=drive_item.id, | |
| connector_type=self.connector_type, | |
| source_identifiers=SourceIdentifiers( | |
| fullpath=server_path, filename=drive_item.name, rel_path=rel_path | |
| ), | |
| metadata=FileDataSourceMetadata( | |
| url=drive_item.parent_reference.path + "/" + drive_item.name, | |
| version=drive_item.etag, | |
| date_modified=str(date_modified_dt.timestamp()) if date_modified_dt else None, | |
| date_created=str(date_created_at.timestamp()) if date_created_at else None, | |
| date_processed=str(time()), | |
| record_locator={ | |
| "user_pname": self.connection_config.user_pname, | |
| "server_relative_path": server_path, | |
| }, | |
| ), | |
| additional_metadata=self.get_properties_sync(drive_item=drive_item), |
use safe_timestamp()
return FileData(
identifier=drive_item.id,
connector_type=self.connector_type,
source_identifiers=SourceIdentifiers(
fullpath=server_path, filename=drive_item.name, rel_path=rel_path
),
metadata=FileDataSourceMetadata(
url=drive_item.parent_reference.path + "/" + drive_item.name,
version=drive_item.etag,
date_modified=safe_timestamp(date_modified_dt),
date_created=safe_timestamp(date_created_at),
date_processed=str(time()),
record_locator={
"user_pname": self.connection_config.user_pname,
"server_relative_path": server_path,
},
),
additional_metadata=self.get_properties_sync(drive_item=drive_item),
)Happy to open a PR if needed
Metadata
Metadata
Assignees
Labels
No labels