Skip to content

PrefixMapStd is very slow for lookups that 'miss' #1474

Open
@Aklakan

Description

@Aklakan

PR: #1475

Version

4.6.0-SNAPSHOT

What happened?

Another follow up on https://issues.apache.org/jira/browse/JENA-2309
I investigated the issue of slow PrefixMapStd further and it turns out that when there are 'hits' using its fast track approach it's very fast - however on 'misses' it gets extremely slow because then it falls back to iteration of all prefixes in the map.

Using a small benchmark runner (part of the PR - see here) shows the the PrefixMapStd can be easily brought to require dozens of seconds for lookups that can actually be accomplished in a split second.
For that I created a new PrefixMap intended as a replacement that combines the best of both worlds: The fast track of PrefixMapStd and the trie backing (together with a guava cache) of the removed FastAbbreviatingPrefixMap.

Approach 0 = Jena's original PrefixMapStd:

Run 1 with base <http://example.org/> and separator / using approach 0: Static IRI lookups took 0.082 seconds
Run 1 with base <http://example.org/> and separator / using approach 0: Dynamic IRI lookups took 33.443 seconds
Run 1 with base <urn:foo:bar:> and separator : using approach 0: Static IRI lookups took 25.482 seconds
Run 1 with base <urn:foo:bar:> and separator : using approach 0: Dynamic IRI lookups took 31.088 seconds

Approach 1 = my optimized PrefixMap:

Run 1 with base <http://example.org/> and separator / using approach 1: Static IRI lookups took 0.070 seconds
Run 1 with base <http://example.org/> and separator / using approach 1: Dynamic IRI lookups took 0.401 seconds
Run 1 with base <urn:foo:bar:> and separator : using approach 1: Static IRI lookups took 0.099 seconds
Run 1 with base <urn:foo:bar:> and separator : using approach 1: Dynamic IRI lookups took 0.433 seconds

W.r.t. to the interface contract of PrefixMap I am not sure whether synchronization / thread-safe maps need to be used.
PrefixMapStd used a ConcurrentHashMap but if synchronization is actually not a concern then a LinkedHashMap that gives control over the order may be a better choice.

Are you interested in making a pull request?

Yes

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions