Skip to content

LCSSMatcher.match_trace_batch() Fails with Multiprocessing #187

@joshuacroff

Description

@joshuacroff

Problem Description

I've encountered an issue when using the LCSSMatcher.match_trace_batch() method for batch matching with multiprocessing. Below is the method definition for reference:

def match_trace_batch(
        self,
        trace_batch: List[Trace],
        processes: int = 1,
    ) -> List[MatchResult]:
        if processes > 1:
            results = [self.match_trace(t) for t in trace_batch]
        else:
            with Pool(processes=processes) as p:
                results = p.map(self.match_trace, trace_batch)

        return results

It's located in the code here: lcss.py#L150

When I run the following code with the default single process:

matcher = LCSSMatcher(nx_map)
match_results = matcher.match_trace_batch(trace_list)

I get this error message:

ValueError: No roads found for Coordinate(coordinate_id=0, x=-13604638.834547484, y=4558638.110141623, crs=('EPSG', '3857'))

Observations

  • The error does occur when using the method with multiprocessing (processes=1).
  • When I set processes to a higher number (e.g., 8), it processes synchronously using list comprehension without errors, but it's slow:
matcher = LCSSMatcher(nx_map)
match_results = matcher.match_trace_batch(trace_list, processes=8)
  • Attempting multiprocessing with concurrent.futures also results in the same error:
matcher = LCSSMatcher(nx_map)
matched_traces = []
with ProcessPoolExecutor(max_workers=7) as executor:
    futures = [executor.submit(process_trace, trace_dict, matcher) for trace_dict in traces]
    for future in as_completed(futures):
        matched_traces.append(future.result())
return matched_traces

Hypothesis

It seems that multiprocessing might be causing the issue due to object serialization ('pickling').

Questions

  1. Could the need for pickling or serialization in multiprocessing be causing this issue?
  2. Should the method logic be updated to process synchronously if processes=1 and use multiprocessing otherwise?
  3. Is this a known issue, or could it be specific to my environment?

Any insights or suggestions would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions