Unexpected memory retention when reading slices of dataframes

### Describe the bug

When reading and slicing a subset of a large DataFrame:

1. The entire DataFrame appears to be loaded into memory.
2. A slice is taken and returned, likely as a view retaining a reference to the original.
3. If this operation is repeated in a loop and each slice is stored (e.g., in a list), the original large DataFrames are never deallocated.

This causes cumulative memory usage to increase continuously, eventually leading to an out-of-memory crash.

### Steps/Code to Reproduce

#### 1. Create demo data

```python
import pandas as pd
import numpy as np
from datetime import datetime


def generate_random_dataframe(n_rows=25, n_cols=10, start_date="2000-01-01", freq="D"):
    """
    Generate a random DataFrame with a datetime index.

    Args:
        n_rows (int): Number of rows in the DataFrame.
        n_cols (int): Number of columns in the DataFrame.
        start_date (str): Start date for the datetime index.
        freq (str): Frequency for the datetime index (e.g., 'D' for daily, 'H' for hourly).

    Returns:
        pd.DataFrame: A random DataFrame with datetime as the index.
    """
    # Generate column names
    cols = [f"COL_{i}" for i in range(n_cols)]

    # Generate random data
    data = np.random.randint(0, 100, size=(n_rows, n_cols))

    # Create a datetime index
    index = pd.date_range(start=start_date, periods=n_rows, freq=freq)

    # Create the DataFrame
    df = pd.DataFrame(data, columns=cols, index=index)

    return df
```
Write a 20 years times 10 000 columns DataFrame with random data to Arctic:
```python
df = generate_random_dataframe(
    n_rows=255 * 20, n_cols=10_000, start_date="1990-01-01", freq="D"
)
df.head()
```
```python
import arcticdb as adb

uri = "lmdb://tmp/arcticdb_leak"
ac = adb.Arctic(uri)

library = ac.get_library("demo_lib", create_if_missing=True)

library.write("test_frame", df)
```

#### 2. Read Data

Helper function to get memory usage of a list of DataFrames:

```python
def get_total_dataframe_size_gb(df_list):
    """
    Calculate total memory usage of a list of DataFrames in gigabytes.

    Parameters:
        df_list (list of pd.DataFrame): List of DataFrames.

    Returns:
        float: Total size in GB.
    """
    total_bytes = sum(df.memory_usage(deep=True).sum() for df in df_list)
    return total_bytes / (1024**3)
```

Read the full dataframe for reference:
```python
from_storage_df = library.read("test_frame").data

print(f"Shape of data: {from_storage_df.shape}")
print(f"Size of data: {get_total_dataframe_size_gb([from_storage_df]):.2f} GB")
```

Read only a slice:
```python
n_rows_to_read = 3

from_date = from_storage_df.index[1]
to_date = from_storage_df.index[n_rows_to_read]

small_df = library.read(
    "test_frame",
    date_range=(from_date, to_date),
).data

print(f"Shape of fetched subset of data: {small_df.shape}")
print(
    f"Size of fetched subset of data: {get_total_dataframe_size_gb([small_df]):.4f} GB"
)
```

Now read the small slice in a loop and save results in a list:
```python
retrieved_data = []
n_times_to_fetch = 100

for i in range(n_times_to_fetch):
    small_df = library.read(
        "test_frame",
        date_range=(
            from_date,
            to_date,
        ),
    ).data
    retrieved_data.append(small_df)

    if i % 10 == 0:
        print(f"Fetched small subset {i} times")
        print(
            f"    Total size of retrieved data so far: {get_total_dataframe_size_gb(retrieved_data):.2f} GB"
        )

print()
print(
    f"Total size of retrieved data: {get_total_dataframe_size_gb(retrieved_data):.2f} GB"
)
```




### Expected Results

Memory usage should in total increase by ca 0.0002 GB * 300 = 0.06 GB -- instead, it increases by several hundred MB per iteration (until out of memory).

![Image](https://github.com/user-attachments/assets/1f75a9dc-5199-48c1-b17f-43f020b8103c)

### OS, Python Version and ArcticDB Version

- Linux
- Python 3.10.12
- ArcticDB 5.2.3

### Backend storage used

LMDB

### Additional Context

We're able to bypass the problem using `adb.QueryBuilder().date_range((from_date, to_date))` as an argument instead to `library.read`, but it's not clear if this is the intended way to do it and why the most obvious way to read a slice of a DataFrame would cause this memory leak.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unexpected memory retention when reading slices of dataframes #2348

Describe the bug

Steps/Code to Reproduce

1. Create demo data

2. Read Data

Expected Results

OS, Python Version and ArcticDB Version

Backend storage used

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unexpected memory retention when reading slices of dataframes #2348

Description

Describe the bug

Steps/Code to Reproduce

1. Create demo data

2. Read Data

Expected Results

OS, Python Version and ArcticDB Version

Backend storage used

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions