Collections with many items saving time issue #1207

santilland · 2023-08-17T13:07:04Z

Using pystac[validation] 1.8.3

I am creating collections with a larger amount of items and was surprised by the time it took to save them. I have been doing some very preliminary tests and it somehow seems that the save time increases exponentially with the amount of items in a collection.
For example saving a catalog with 1 collection takes depending on item count:

Items	Time
200	0.225s
2000	5.439s
10000	105.975s

If i create 5 collections with 2000 items the saving time is 25s. So the same amount of items are being saved in total but it takes 4 times less when separated into multiple collections.

Any ideas why this could be happening?

Here is a very rough testing script:


import time
from datetime import (
    datetime,
    timedelta,
)
from pystac import (
    Item,
    Catalog,
    CatalogType,
    Collection,
    Extent,
    SpatialExtent,
    TemporalExtent,
)
from pystac.layout import TemplateLayoutStrategy

numdays = 10000
number_of_collections = 1
base = datetime.today()
times = [base - timedelta(days=x) for x in range(numdays)]

catalog = Catalog(
    id = "test",
    description = "catalog to test performance",
    title = "performance test catalog",
    catalog_type=CatalogType.RELATIVE_PUBLISHED,
)

spatial_extent = SpatialExtent([
    [-180.0, -90.0, 180.0, 90.0],
])
temporal_extent = TemporalExtent([[datetime.now()]])
extent = Extent(spatial=spatial_extent, temporal=temporal_extent)


for idx in range(number_of_collections):
    collection = Collection(
        id="big_collection%s"%idx,
        title="collection for items",
        description="some desc",
        extent=extent
    )
    for t in times:
        item = Item(
            id = t.isoformat(),
            bbox=[-180.0, -90.0, 180.0, 90.0],
            properties={},
            geometry = None,
            datetime = t,
        )
        collection.add_item(item)
    collection.update_extent_from_items()

    catalog.add_child(collection)

strategy = TemplateLayoutStrategy(item_template="${collection}/${year}")
catalog.normalize_hrefs("https://exampleurl.com/", strategy=strategy)

start_time = time.perf_counter()
catalog.save(dest_href="../test_build/")
end_time = time.perf_counter()
print(f"Saving Time : {end_time - start_time:0.6f}" )

The text was updated successfully, but these errors were encountered:

m-mohr · 2023-08-17T14:01:01Z

My timings on pystac (master branch) are as follows:

Collections	Items	Time
1	200	1.247s
1	2000	12.880s
1	10000	70.173s
5	2000	61.693s

It's weird that the times look reasonable (nearly linear) for my tests, but not for you. The difference is a different machine, installing validation requirements and installing from pypi, while I'm on master.

santilland · 2023-08-17T14:04:21Z

Strange, i was getting the same slow saving on the github action workflow (which uses ubuntu-latest). My machine is also ubuntu, what OS are you on?
Maybe some file handler issue?

m-mohr · 2023-08-17T14:06:43Z

I'm running Ubuntu 22.04.2 LTS through WSL.

I also just ran pystac 1.8.3 installed from pypi with validation requirements and the timings are similar to the ones I reported above. Weird...

gadomski · 2023-08-17T14:09:29Z

Thanks for the report, and thanks @m-mohr for also taking a look. My timings (macos) working from main w/ Python 3.11:

Collections	Items	Time
1	200	0.67s
1	2000	9.36s
1	10000	102.65s
5	2000	45.48s

So I'm seeing performance similar to @santilland.

I will note that there's some known issues around serializing a STACObject to a dictionary, specifically around link resolution and the network requests it can fire off: #960. This could be part of the issue, though I don't quite know how yet.

m-mohr · 2023-08-17T14:12:54Z

Weird. We were just thinking maybe it's an issue with memory consumption / swapping?
My machine is not at max (one of the CPUs is close, but memory is not an issue at all due to 128GB), but @santilland is maxing out both with regards to memory and one of the CPUs.

My timings are from Python 3.10.6

gadomski · 2023-08-17T14:27:13Z

Yeah, I'm seeing my run get CPU bound, but not memory bound. 🤔

santilland · 2023-08-17T14:32:31Z

I think @m-mohr commented on possible swap issue because my machine in general is close to the edge already but i did not see a special increase in memory use. But one core is always at 100% while saving.

m-mohr · 2023-08-17T14:33:53Z

Yeah, maybe it's something else. I'm on WSL and not on native Ubuntu, so maybe some kind of IO thing? Or something completely different :-) I guess profiling on an affected machine may give some insights...

moradology · 2024-10-08T15:31:54Z

I've started profiling this and will attempt to take a crack at it assuming no one else is too deep in it yet. Echoing what @santilland said above, I'm seeing very little memory pressure and a pretty engaged CPU.

What I've found so far via cProfile output is that there's a whole lot of repeated calls mostly to sort out paths for stac_objects of various sorts. e.g. I'm seeing on the order of 100k get_self_href calls for 10k items and 1 collection. We're talking about at least half of the total runtime during catalog.save calls. The easy fix (simply caching things) isn't so nice because these stac_objects are mutated in ways that change the expected output of get_self_href and it seems not-entirely-pleasant to spread cache invalidation logic all over the place.
It occurs to me that it would be extremely nice to have a more declarative, lazily evaluated story to tell here (just figure out hrefs etc. when saving or to_dicting) but also like that might be a very, very heavy lift.

Some thoughts about performance that I'm curious about now:

Python dynamically resolves method calls which means that it has to look up the inheritance structure every time one of these calls happen. We could probably see some benefit from in-lining some of the relevant logic. Of course, this comes at some cost in terms of organization and readability.
__dict__ access for attributes is another source of overhead and tends to be relevant when you're building up python programs that have tons and tons of little objects (beyond 10k is a heuristic some use and we can easily get beyond that) so maybe __slots__ are worth considering here (though this path comes at the cost of dynamically assignable attributes)

gadomski · 2024-10-08T21:43:49Z

Sounds great @moradology thanks for digging in. I agree generally w/ your assessment that caching Feels Bad™. FYI __slots__ feature request tracking issue: #1229

moradology · 2024-10-09T18:13:42Z

If we have a machine that can reproduce the dramatic increase in runtimes here, I'd be interested to see how __slots__ maybe pushes those performance penalties back (assuming this is perhaps memory-related - though I'm not at all certain it is). I'm pretty confident, however, that it's worth looking into more aggressive techniques by which we can avoid all the href resolution that's going on. See below for

For now, this branch should work for testing slots behavior: https://github.com/moradology/pystac/tree/feature/use-slots

Here's some code I've been using to check things out:

import argparse
import cProfile
import pstats
import time
from datetime import datetime, timedelta

from pystac import (
    Catalog,
    CatalogType,
    Collection,
    Extent,
    Item,
    SpatialExtent,
    TemporalExtent,
)
from pystac.layout import TemplateLayoutStrategy


def parse_args():
    parser = argparse.ArgumentParser(description="STAC Catalog performance test.")
    parser.add_argument(
        "--numdays",
        type=int,
        required=True,
        help="Number of days to generate items for.",
    )
    return parser.parse_args()


def build_items(numdays):
    """Builds items for the catalog."""
    base = datetime.today()
    times = [base - timedelta(days=x) for x in range(numdays)]

    items = [
        Item(
            id=t.isoformat(),
            bbox=[-180.0, -90.0, 180.0, 90.0],
            properties={},
            geometry=None,
            datetime=t,
        )
        for t in times
    ]

    return items


def run_test(numdays):
    items = build_items(numdays)

    catalog = Catalog(
        id="test",
        description="catalog to test performance",
        title="performance test catalog",
        catalog_type=CatalogType.RELATIVE_PUBLISHED,
    )

    spatial_extent = SpatialExtent([[-180.0, -90.0, 180.0, 90.0]])
    temporal_extent = TemporalExtent([[datetime.now()]])
    extent = Extent(spatial=spatial_extent, temporal=temporal_extent)

    collection = Collection(
        id="big_collection",
        title="collection for items",
        description="some desc",
        extent=extent,
    )

    for item in items:
        collection.add_item(item)

    collection.update_extent_from_items()
    catalog.add_child(collection)

    strategy = TemplateLayoutStrategy(item_template="${collection}/${year}")
    catalog.normalize_hrefs("https://exampleurl.com/", strategy=strategy)

    # Profile the catalog save operation
    pr = cProfile.Profile()
    pr.enable()

    start_time = time.perf_counter()
    catalog.save(dest_href="../test_build/")
    end_time = time.perf_counter()

    pr.disable()

    save_time = end_time - start_time
    print(f"Saving Time with {numdays} items: {save_time:.6f} seconds")

    return pr


def main():
    args = parse_args()
    numdays = args.numdays

    profiler = run_test(numdays)

    stats = pstats.Stats(profiler)

    stats.strip_dirs()
    print("\n=================================")
    stats.print_callers("get_self_href")
    stats.print_callees("get_self_href")

    print("\n=================================")
    stats.print_callers("get_single_link")
    stats.print_callees("get_single_link")

    print("\n=================================")
    print("default stats output")
    print("=================================")
    stats.sort_stats('cumulative')
    stats.print_stats()


if __name__ == "__main__":
    main()

And here's what the outputs ought to look like:

(stac) ➜  pystac git:(feature/use-slots) ✗ python perftest.py --numdays 1000
Saving Time with 1000 items: 0.965173 seconds

=================================
   Random listing order was used
   List reduced from 127 to 1 due to restriction <'get_self_href'>

Function                           was called by...
                                       ncalls  tottime  cumtime
stac_object.py:279(get_self_href)  <-       2    0.000    0.000  catalog.py:935(save)
                                         8012    0.006    0.130  link.py:164(get_href)
                                            4    0.000    0.000  link.py:219(get_absolute_href)
                                         2006    0.001    0.043  stac_object.py:265(self_href)


   Random listing order was used
   List reduced from 127 to 1 due to restriction <'get_self_href'>

Function                           called...
                                       ncalls  tottime  cumtime
stac_object.py:279(get_self_href)  ->   10024    0.001    0.001  link.py:259(get_target_str)
                                        10024    0.001    0.001  link.py:272(has_target_href)
                                        10024    0.005    0.164  stac_object.py:185(get_single_link)



=================================
   Random listing order was used
   List reduced from 127 to 1 due to restriction <'get_single_link'>

Function                             was called by...
                                         ncalls  tottime  cumtime
stac_object.py:185(get_single_link)  <-    9020    0.004    0.007  stac_object.py:255(get_root_link)
                                          10024    0.005    0.164  stac_object.py:279(get_self_href)


   Random listing order was used
   List reduced from 127 to 1 due to restriction <'get_single_link'>

Function                             called...
                                         ncalls  tottime  cumtime
stac_object.py:185(get_single_link)  ->   19044    0.001    0.001  stac_object.py:207(<genexpr>)
                                          19044    0.003    0.161  {built-in method builtins.next}


         1081187 function calls (1080185 primitive calls) in 0.964 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      2/1    0.002    0.001    0.965    0.965 catalog.py:935(save)
     1002    0.001    0.000    0.699    0.001 stac_object.py:439(save_object)
     4005    0.007    0.000    0.554    0.000 link.py:382(to_dict)
     4007    0.008    0.000    0.535    0.000 link.py:164(get_href)
     5009    0.005    0.000    0.452    0.000 utils.py:219(make_relative_href)
    13027    0.014    0.000    0.446    0.000 <frozen posixpath>:397(abspath)
     1000    0.003    0.000    0.414    0.000 item.py:368(to_dict)
     1000    0.001    0.000    0.404    0.000 item.py:390(<listcomp>)
     5009    0.007    0.000    0.399    0.000 utils.py:162(_make_relative_href_url)
    13027    0.390    0.000    0.390    0.000 {built-in method posix.getcwd}
     5009    0.014    0.000    0.383    0.000 <frozen posixpath>:486(relpath)
    10024    0.007    0.000    0.174    0.000 stac_object.py:279(get_self_href)
    19044    0.009    0.000    0.172    0.000 stac_object.py:185(get_single_link)
    19044    0.003    0.000    0.161    0.000 {built-in method builtins.next}
    38088    0.160    0.000    0.160    0.000 stac_object.py:207(<genexpr>)
        2    0.000    0.000    0.151    0.076 catalog.py:641(to_dict)
        2    0.000    0.000    0.151    0.075 catalog.py:657(<listcomp>)
        1    0.000    0.000    0.148    0.148 collection.py:584(to_dict)
     1002    0.001    0.000    0.129    0.000 stac_io.py:240(save_json)
     1002    0.001    0.000    0.126    0.000 stac_io.py:310(write_text)
     1002    0.002    0.000    0.125    0.000 stac_io.py:318(write_text_to_href)
     1005    0.001    0.000    0.121    0.000 utils.py:317(make_absolute_href)
     1003    0.002    0.000    0.109    0.000 utils.py:287(_make_absolute_href_path)
    17037    0.016    0.000    0.079    0.000 utils.py:39(safe_urlparse)
     1002    0.067    0.000    0.067    0.000 {built-in method io.open}
    17037    0.018    0.000    0.054    0.000 parse.py:374(urlparse)
     2006    0.001    0.000    0.044    0.000 stac_object.py:265(self_href)
     1002    0.044    0.000    0.044    0.000 {method '__exit__' of '_io._IOBase' objects}
    19039    0.021    0.000    0.032    0.000 <frozen posixpath>:71(join)
     4009    0.014    0.000    0.024    0.000 parse.py:452(urlsplit)
     4007    0.001    0.000    0.023    0.000 utils.py:355(is_absolute_href)
     9018    0.006    0.000    0.019    0.000 stac_object.py:321(get_root)
    15037    0.007    0.000    0.011    0.000 utils.py:25(make_posix_style)
    14030    0.006    0.000    0.011    0.000 <frozen posixpath>:60(isabs)
     9020    0.003    0.000    0.010    0.000 stac_object.py:255(get_root_link)
     6012    0.006    0.000    0.009    0.000 <frozen posixpath>:150(dirname)
    13027    0.005    0.000    0.009    0.000 <frozen posixpath>:389(normpath)
    39081    0.007    0.000    0.009    0.000 <frozen posixpath>:41(_get_sep)
    86161    0.009    0.000    0.009    0.000 {method 'startswith' of 'str' objects}
   105223    0.007    0.000    0.009    0.000 {built-in method builtins.isinstance}
     5009    0.006    0.000    0.009    0.000 <frozen genericpath>:69(commonprefix)
    21050    0.007    0.000    0.008    0.000 parse.py:119(_coerce_args)
     1002    0.000    0.000    0.008    0.000 stac_io.py:392(_is_url)
     8010    0.003    0.000    0.007    0.000 utils.py:85(__str__)
    55131    0.005    0.000    0.005    0.000 {method 'replace' of 'str' objects}
     7009    0.002    0.000    0.005    0.000 link.py:136(title)
    17037    0.003    0.000    0.005    0.000 <string>:1(<lambda>)
    92194    0.004    0.000    0.004    0.000 {built-in method posix.fspath}
    44088    0.004    0.000    0.004    0.000 {method 'lower' of 'str' objects}
     1002    0.001    0.000    0.004    0.000 utils.py:370(datetime_to_str)
    31046    0.003    0.000    0.003    0.000 {method 'endswith' of 'str' objects}
     2001    0.002    0.000    0.003    0.000 parse.py:413(_splitnetloc)
     1002    0.001    0.000    0.003    0.000 <frozen genericpath>:39(isdir)
     8012    0.002    0.000    0.003    0.000 enum.py:193(__get__)
    13027    0.003    0.000    0.003    0.000 {built-in method posix._path_normpath}
     1002    0.000    0.000    0.003    0.000 version.py:35(get_stac_version)
     1002    0.001    0.000    0.002    0.000 version.py:19(get_stac_version)
    21046    0.002    0.000    0.002    0.000 {built-in method __new__ of type object at 0x101b93308}
     1002    0.002    0.000    0.002    0.000 {built-in method posix.stat}
        1    0.000    0.000    0.002    0.002 stac_object.py:150(target_in_hierarchy)
   1002/1    0.001    0.000    0.002    0.002 stac_object.py:162(traverse)
    10018    0.002    0.000    0.002    0.000 {method 'split' of 'str' objects}
     1002    0.001    0.000    0.002    0.000 <frozen _collections_abc>:771(get)
     1002    0.001    0.000    0.002    0.000 stac_io.py:114(json_dumps)
     5002    0.001    0.000    0.002    0.000 <frozen abc>:117(__instancecheck__)
     1002    0.001    0.000    0.002    0.000 stac_object.py:168(<listcomp>)
     7010    0.002    0.000    0.002    0.000 {built-in method builtins.min}
     5009    0.002    0.000    0.002    0.000 <frozen posixpath>:509(<listcomp>)
     5009    0.002    0.000    0.002    0.000 <frozen posixpath>:508(<listcomp>)
     1002    0.001    0.000    0.001    0.000 <frozen os>:674(__getitem__)
    14027    0.001    0.000    0.001    0.000 {method 'lstrip' of 'str' objects}
    15025    0.001    0.000    0.001    0.000 link.py:237(target)
    19033    0.001    0.000    0.001    0.000 {built-in method builtins.len}
     1002    0.001    0.000    0.001    0.000 {method 'isoformat' of 'datetime.datetime' objects}
    10012    0.001    0.000    0.001    0.000 {method 'find' of 'str' objects}
    21050    0.001    0.000    0.001    0.000 parse.py:108(_noop)
    10024    0.001    0.000    0.001    0.000 link.py:259(get_target_str)
    18029    0.001    0.000    0.001    0.000 typing.py:2287(cast)
     4007    0.001    0.000    0.001    0.000 catalog.py:239(is_relative)
    10024    0.001    0.000    0.001    0.000 link.py:272(has_target_href)
     5009    0.001    0.000    0.001    0.000 {built-in method builtins.max}
     1000    0.001    0.000    0.001    0.000 {method 'replace' of 'datetime.datetime' objects}
     8012    0.001    0.000    0.001    0.000 enum.py:1257(value)
    10019    0.001    0.000    0.001    0.000 link.py:362(is_resolved)
     1002    0.001    0.000    0.001    0.000 {orjson.dumps}
     6012    0.001    0.000    0.001    0.000 {method 'rfind' of 'str' objects}
     5002    0.001    0.000    0.001    0.000 {built-in method _abc._abc_instancecheck}
     4009    0.001    0.000    0.001    0.000 parse.py:421(_checknetloc)
     6007    0.001    0.000    0.001    0.000 {method 'rstrip' of 'str' objects}
     1000    0.001    0.000    0.001    0.000 item.py:373(<listcomp>)
     1002    0.000    0.000    0.001    0.000 stac_io.py:268(default)
     4007    0.001    0.000    0.001    0.000 hooks.py:86(get_extended_object_links)
     1002    0.000    0.000    0.001    0.000 <frozen os>:756(encode)
     5005    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}
     5006    0.000    0.000    0.000    0.000 link.py:370(is_hierarchical)
     6002    0.000    0.000    0.000    0.000 {method 'isascii' of 'str' objects}
        2    0.000    0.000    0.000    0.000 catalog.py:644(<listcomp>)
     4001    0.000    0.000    0.000    0.000 {method 'isalpha' of 'str' objects}
     4009    0.000    0.000    0.000    0.000 {method 'strip' of 'str' objects}
     1002    0.000    0.000    0.000    0.000 {method 'decode' of 'bytes' objects}
     1002    0.000    0.000    0.000    0.000 stac_io.py:46(__init__)
     1002    0.000    0.000    0.000    0.000 <frozen codecs>:186(__init__)
     1002    0.000    0.000    0.000    0.000 {method 'write' of '_io.TextIOWrapper' objects}
     1002    0.000    0.000    0.000    0.000 {method 'encode' of 'str' objects}
     1000    0.000    0.000    0.000    0.000 item.py:375(<dictcomp>)
     1000    0.000    0.000    0.000    0.000 {method 'pop' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 catalog.py:650(<listcomp>)
     1002    0.000    0.000    0.000    0.000 {built-in method _stat.S_ISDIR}
        4    0.000    0.000    0.000    0.000 stac_object.py:216(get_links)
        4    0.000    0.000    0.000    0.000 stac_object.py:237(<listcomp>)
        2    0.000    0.000    0.000    0.000 link.py:219(get_absolute_href)
        2    0.000    0.000    0.000    0.000 catalog.py:633(get_item_links)
     1001    0.000    0.000    0.000    0.000 {method 'add' of 'set' objects}
        2    0.000    0.000    0.000    0.000 catalog.py:473(get_child_links)
        1    0.000    0.000    0.000    0.000 collection.py:331(to_dict)
        1    0.000    0.000    0.000    0.000 collection.py:218(to_dict)
        2    0.000    0.000    0.000    0.000 utils.py:255(_make_absolute_href_url)
        2    0.000    0.000    0.000    0.000 parse.py:509(urlunparse)
        2    0.000    0.000    0.000    0.000 parse.py:520(urlunsplit)
        1    0.000    0.000    0.000    0.000 summaries.py:303(is_empty)
        2    0.000    0.000    0.000    0.000 tz.py:74(utcoffset)
        2    0.000    0.000    0.000    0.000 {method 'title' of 'str' objects}
        1    0.000    0.000    0.000    0.000 collection.py:90(to_dict)
        2    0.000    0.000    0.000    0.000 {built-in method time.perf_counter}
        5    0.000    0.000    0.000    0.000 {built-in method builtins.any}
        1    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

moradology · 2024-10-09T18:30:28Z

One thing to note is that we're iterating through links very regularly (which happens here: https://github.com/stac-utils/pystac/blob/main/pystac/stac_object.py#L177-L206). It isn't entirely clear that this is the best possible move, as it means iterating through the list and doing a bunch of comparison logic for rel and media_type (on a positive note, at least we're doing this lazily!)
In the output above, this appears to be happening in get_root_link and get_self_link, which ought to both be unique for any given object. A couple of thoughts:

It complicates logic a bit, but if they're unique and we can anticipate that, perhaps it makes sense to produce the result of links dynamically and reserve attributes for root and self links.
Another, perhaps complimentary strategy might be to avoid all the duplication of objects (e.g. here https://github.com/stac-utils/pystac/blob/main/pystac/item.py#L278) + mutation and reuse the implicit pointer python forces upon us. On this strategy, we do not duplicate e.g. collection links and instead to simply compute serialization etc. via a function that takes the relevant information of the class holding onto these pointers (e.g. our Item classes)

gadomski · 2024-10-10T10:34:09Z

It complicates logic a bit, but if they're unique and we can anticipate that, perhaps it makes sense to make produce the result of links dynamically and reserve attributes for root and self links.

I don't hate this idea. root and self are so heavily used that iterating-to-find each time Feels Bad™. I'd be interested to see how hairy implementation gets (feels like we're edging towards cache hell).

n this strategy to, we do not duplicate e.g. collection links and instead to simply compute serialization etc. via a function that takes the relevant information of the class holding onto these pointers (e.g. our Item classes)

This feels ok-ish. My initial worry is around unintended mutation — as a rule pystac isn't too careful about what gets changed when, so if we just work on pointers I'm a little worried we might twiddle bits of a STAC tree without intending to. But again, I'd be open to looking at an implementation.

moradology · 2024-10-10T15:56:59Z

Carrying over some results discussed in the __slots__ pr to show how different bits of code dominate the runtime for different sizes of .save calls.

These are the top offenders in terms of cumulative time for smallish (1000 item) catalog saves:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      2/1    0.002    0.001    0.965    0.965 catalog.py:935(save)
     1002    0.001    0.000    0.699    0.001 stac_object.py:439(save_object)
     4005    0.007    0.000    0.554    0.000 link.py:382(to_dict)
     4007    0.008    0.000    0.535    0.000 link.py:164(get_href)
     5009    0.005    0.000    0.452    0.000 utils.py:219(make_relative_href)
    13027    0.014    0.000    0.446    0.000 <frozen posixpath>:397(abspath)
     1000    0.003    0.000    0.414    0.000 item.py:368(to_dict)
     1000    0.001    0.000    0.404    0.000 item.py:390(<listcomp>)
     5009    0.007    0.000    0.399    0.000 utils.py:162(_make_relative_href_url)
    13027    0.390    0.000    0.390    0.000 {built-in method posix.getcwd}
     5009    0.014    0.000    0.383    0.000 <frozen posixpath>:486(relpath)
    10024    0.007    0.000    0.174    0.000 stac_object.py:279(get_self_href)
    19044    0.009    0.000    0.172    0.000 stac_object.py:185(get_single_link)
    19044    0.003    0.000    0.161    0.000 {built-in method builtins.next}
    38088    0.160    0.000    0.160    0.000 stac_object.py:207(<genexpr>)

Here are the top offenders with 100x more (100,000) items:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      2/1    0.498    0.249 2252.578 2252.578 catalog.py:935(save)
  1000024    0.991    0.000 2008.690    0.002 stac_object.py:279(get_self_href)
  1900044    1.341    0.000 2008.273    0.001 stac_object.py:185(get_single_link)
  1900044    0.379    0.000 2006.719    0.001 {built-in method builtins.next}
  3800088 2006.553    0.001 2006.553    0.001 stac_object.py:207(<genexpr>)
   100002    0.349    0.000 1692.138    0.017 stac_object.py:439(save_object)
   400005    1.098    0.000 1666.717    0.004 link.py:382(to_dict)
   400007    1.421    0.000 1663.745    0.004 link.py:164(get_href)
   100000    0.555    0.000 1165.005    0.012 item.py:368(to_dict)
   100000    0.154    0.000 1162.880    0.012 item.py:390(<listcomp>)
        2    0.000    0.000  504.066  252.033 catalog.py:641(to_dict)
        2    0.047    0.023  504.039  252.019 catalog.py:657(<listcomp>)
        1    0.000    0.000  503.808  503.808 collection.py:584(to_dict)
   200006    0.087    0.000  490.862    0.002 stac_object.py:265(self_href)

gadomski added the bug Things which are broken label Aug 17, 2023

gadomski added this to the v1.12 milestone Sep 23, 2024

moradology mentioned this issue Oct 9, 2024

Use slots in common classes #1434

Draft

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collections with many items saving time issue #1207

Collections with many items saving time issue #1207

santilland commented Aug 17, 2023

m-mohr commented Aug 17, 2023

santilland commented Aug 17, 2023

m-mohr commented Aug 17, 2023

gadomski commented Aug 17, 2023 •

edited

Loading

m-mohr commented Aug 17, 2023 •

edited

Loading

gadomski commented Aug 17, 2023

santilland commented Aug 17, 2023

m-mohr commented Aug 17, 2023 •

edited

Loading

moradology commented Oct 8, 2024 •

edited

Loading

gadomski commented Oct 8, 2024

moradology commented Oct 9, 2024 •

edited

Loading

moradology commented Oct 9, 2024 •

edited

Loading

gadomski commented Oct 10, 2024

moradology commented Oct 10, 2024

Collections with many items saving time issue #1207

Collections with many items saving time issue #1207

Comments

santilland commented Aug 17, 2023

m-mohr commented Aug 17, 2023

santilland commented Aug 17, 2023

m-mohr commented Aug 17, 2023

gadomski commented Aug 17, 2023 • edited Loading

m-mohr commented Aug 17, 2023 • edited Loading

gadomski commented Aug 17, 2023

santilland commented Aug 17, 2023

m-mohr commented Aug 17, 2023 • edited Loading

moradology commented Oct 8, 2024 • edited Loading

gadomski commented Oct 8, 2024

moradology commented Oct 9, 2024 • edited Loading

moradology commented Oct 9, 2024 • edited Loading

gadomski commented Oct 10, 2024

moradology commented Oct 10, 2024

gadomski commented Aug 17, 2023 •

edited

Loading

m-mohr commented Aug 17, 2023 •

edited

Loading

m-mohr commented Aug 17, 2023 •

edited

Loading

moradology commented Oct 8, 2024 •

edited

Loading

moradology commented Oct 9, 2024 •

edited

Loading

moradology commented Oct 9, 2024 •

edited

Loading