Skip to content

Inventory for ifs/enfo only returns F00-F03 on AWS/Azure #430

@mysterefrank

Description

@mysterefrank

When I try to retrieve the full inventory for ECMWF (model='ifs', product='enfo') using Herbie().inventory() the returned inventory only contains steps for forecast hours 0 and 3 even for recent dates. This occurs for both source='aws' and source='azure'.

# Inside a loop for date_iter, cycle_iter
try:
    # Initialize Herbie WITHOUT fxx to get full cycle inventory
    H_check = Herbie(date=date_iter, model='ifs', product='enfo', cycle=cycle_iter, source='azure', verbose=False) # Also tested with source='aws'

    if H_check.idx is None:
        print(f"    ERROR: Herbie could not find an index file URL for this cycle. Skipping.")
        continue

    full_inventory_df = H_check.inventory(verbose=False) # Get full inventory

    if full_inventory_df is not None and not full_inventory_df.empty:
        # --- Debugging ---
        base_var_search = ':2t:' # Example variable
        if 'search_this' in full_inventory_df.columns:
            matching_rows = full_inventory_df[full_inventory_df['search_this'].str.contains(base_var_search, regex=False, na=False)]
            print(f"    DEBUG: Found {len(matching_rows)} inventory rows containing '{base_var_search}'.")
            if not matching_rows.empty and 'step' in matching_rows.columns:
                unique_steps_in_matches = sorted([str(s) for s in matching_rows['step'].unique()])
                print(f"    DEBUG: Unique 'step' values in matching rows: {unique_steps_in_matches}")

        if 'step' in full_inventory_df.columns:
            all_steps = full_inventory_df['step'].unique()
            all_steps_str = sorted([str(s) for s in all_steps])
            print(f"    DEBUG: All unique raw 'step' values found in inventory: {all_steps_str}")
            available_fxx_for_cycle = sorted(list(set(
                 [int(s.total_seconds() / 3600) for s in all_steps if isinstance(s, pd.Timedelta)]
            )))
            available_fxx_for_cycle = [h for h in available_fxx_for_cycle if h >= 0]
            print(f"    DEBUG: Extracted integer hours from Timedeltas: {available_fxx_for_cycle}")
        # --- End Debugging ---

Output:

  Checking inventory for 2025-04-15 Cycle 00Z...
    DEBUG: Inventory columns: ['grib_message', 'start_byte', 'end_byte', 'range', 'reference_time', 'valid_time', 'step', 'param', 'levelist', 'levtype', 'number', 'domain', 'expver', 'class', 'type', 'stream', 'search_this']
    DEBUG: Inventory shape: (8007, 17)
    DEBUG: Found 51 inventory rows containing ':2t:'.
    DEBUG: Unique 'step' values in matching rows: ['0 days 00:00:00']
    DEBUG: All unique raw 'step' values found in inventory: ['0 days 00:00:00', '0 days 03:00:00']
    DEBUG: Extracted integer hours from Timedeltas: [0, 3]
    Found 2 available hours for cycle (Range: 0h to 3h). Generating tasks...

The Herbie().inventory() call for ifs/enfo should return index entries corresponding to all available forecast hours for the specified cycle (e.g., F00, F03, F06, ..., F360 or similar, depending on the cycle's full length), likely by discovering and combining multiple underlying index files provided by the source.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions