Skip to content

Processing Server: expand_page_ids is overly strict #1333

@bertsky

Description

@bertsky

I have a workspace with heterogeneous page identifiers:

  • f_1875-Himnos-EMU-0040
  • f_1878-Helpin-EMU-0027
  • f_1890-Evange-UMI-0007
  • f_1892-Womans-EMU-0015
  • f_192x-Gospel-MTS-0007

When I try to process this in the Processing Server, it fails, because it cannot convert this to a single numerical range:

   File "/build/core/src/ocrd_network/processing_server.py", line 744, in run_workflow
     responses = await self.task_sequence_to_processing_jobs(
   File "/build/core/src/ocrd_network/processing_server.py", line 693, in task_sequence_to_processing_jobs
     response = await self.validate_and_forward_job_to_network_agent(
   File "/build/core/src/ocrd_network/processing_server.py", line 431, in validate_and_forward_job_to_network_agent
     page_ids = expand_page_ids(data.page_id)
   File "/build/core/src/ocrd_network/utils.py", line 59, in expand_page_ids
     page_ids += generate_range(*page_id_token.split(sep='..', maxsplit=1))
   File "/build/core/src/ocrd_utils/str.py", line 231, in generate_range
     raise ValueError(f"Range '{start}..{end}' differ in their non-numeric part: '{start[:-len(start_num)]}' != '{end[:-len(end_num)]}'")
 ValueError: Range 'f_1875-Himnos-EMU-0040..f_192x-Gospel-MTS-0007' differ in their non-numeric part: 'f_1875-Himnos-EMU-' != 'f_192x-Gospel-MTS-'

This is with page_wise=False.

IMO if splitting is necessary, it should be across integer ranges (pure ordinal page numbers), ignoring page IDs.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions