About tokenizing test-split captions on own dataset

In preprocess/tokenize_captions.py,
```
def load_annotations(coco_dir):
    with open(os.path.join(coco_dir, 'annotations', f'captions_train2014.json')) as f:
        annotations = json.load(f)['annotations']

    with open(os.path.join(coco_dir, 'annotations', f'captions_val2014.json')) as f:
        annotations.extend(json.load(f)['annotations'])

    return annotations
```
It seems that this code was not loading test2014. To avoid this problem, I would like you to modify this code into
```
def load_annotations(coco_dir):
    with open(os.path.join(coco_dir, 'annotations', f'captions_train2014.json')) as f:
        annotations = json.load(f)['annotations']

    with open(os.path.join(coco_dir, 'annotations', f'captions_val2014.json')) as f:
        annotations.extend(json.load(f)['annotations'])
    
    with open(os.path.join(coco_dir, 'annotations', f'captions_test2014.json')) as f:
        annotations.extend(json.load(f)['annotations'])

    return annotations
```
which took me 2 days to figure out this problem, before which hasn't I walked out of suspecting my dataset problem yet:C

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

About tokenizing test-split captions on own dataset #31

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

About tokenizing test-split captions on own dataset #31

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions