extracting only a part of the tea file

I would like to select/read only a set of data from file based on criteria. Is there an optimal approach for doing it?

### attempt 1

model

```cs
    public struct CandleInDbNew
    {
        public uint OpenTs;

        public decimal OpenPrice;
        public decimal HighPrice;
        public decimal LowPrice;
        public decimal ClosePrice;

        public uint TradeCount;

        public decimal Volume;
        public decimal QuoteAssetVolume;
        public decimal TakerBuyBaseAssetVolume;
        public decimal TakerBuyQuoteAssetVolume;
    }
```

method

```csharp
        public List<CandleInDbNew> GetCandlesInRange(
            string fileFullPath,
            uint from)
        {
            var result = new List<CandleInDbNew>();

            if (!File.Exists(fileFullPath))
            {
                return result;
            }

            using (var tf = TeaFile<CandleInDbNew>.OpenRead(fileFullPath,
                    ItemDescriptionElements.FieldNames |
                    ItemDescriptionElements.FieldTypes |
                    ItemDescriptionElements.FieldOffsets |
                    ItemDescriptionElements.ItemSize))
            {
                foreach (var item in tf.Items)
                {
                    if (item.OpenTs >= from)
                        result.Add(item);
                }
            }

            return result;
        }
```

Given that my data in file is sorted by `OpenTs` I would like to filter out the values that are not within a specific range as in example above. 
### issue

This approach is really inefficient, because the whole `Item` is being read and mapped right away. It's slow. Not solving the problem.

## attempt 2

I have also tried using the **unmapped** approach. But exception is thrown upon read

> System.IO.IOException: 'Decimal constructor requires an array or span of four valid decimal bytes.'

![image](https://github.com/discretelogics/TeaFiles.Net-Time-Series-Storage-in-Files/assets/20644772/a1fe7f50-1164-480e-9304-62fe7784c16d)

I have managed to extract part of the data that causes the issue.  https://github.com/pavlexander/testfile/blob/main/ETHBTC_big.7z

There were no issues with 10k, 50k, 100k of records. But at 1 mil of records I started getting the error..  Please download, unpack the file, then use following code to repro:

```cs
            var result = new List<CandleInDbNew>();

            using (var tf = TeaFile.OpenRead("ETHBTC_big.tea")) // exception here
            {
                var openTsColumn = tf.Description.ItemDescription.GetFieldByName("OpenTs");

                foreach (Item item in tf.Items)
                {
                    var openTs = (uint)openTsColumn.GetValue(item);

                    if (openTs >= 1692190740)
                        result.Add(default); // temporary
                }
            }
```

### issue

even if this solution worked there is no guarantee that it would work faster than approach 1. In fact, on a smaller dataset where no exceptions are thrown - on my machine approach 1 performs many times faster than approach 2.. If we put the error aside - I also want to know how to map an `item` to `struct`..

## conclusion

the original question still stands - how to filter out the data based on criteria and avoid reading all file.. 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

extracting only a part of the tea file #28

attempt 1

issue

attempt 2

issue

conclusion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

extracting only a part of the tea file #28

Description

attempt 1

issue

attempt 2

issue

conclusion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions