`load_dataset` fails to load dataset saved by `save_to_disk`

### Describe the bug

This code fails to load the dataset it just saved:

```python
from datasets import load_dataset
from transformers import AutoTokenizer

MODEL = "google-bert/bert-base-cased"
tokenizer = AutoTokenizer.from_pretrained(MODEL)

dataset = load_dataset("yelp_review_full")

def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_datasets = dataset.map(tokenize_function, batched=True)
tokenized_datasets.save_to_disk("dataset")

tokenized_datasets = load_dataset("dataset/")  # raises
```

It raises `ValueError: Couldn't infer the same data file format for all splits. Got {NamedSplit('train'): ('arrow', {}), NamedSplit('test'): ('json', {})}`.

I believe this bug is caused by the [logic that tries to infer dataset format](https://github.com/huggingface/datasets/blob/9af8dd3de7626183a9a9ec8973cebc672d690400/src/datasets/load.py#L556). It counts the most common file extension. However, a small dataset can fit in a single `.arrow` file and have two JSON metadata files, causing the format to be inferred as JSON:

```shell
$ ls -l dataset/test
-rw-r--r-- 1 sliedes sliedes 191498784 Jul  1 13:55 data-00000-of-00001.arrow
-rw-r--r-- 1 sliedes sliedes      1730 Jul  1 13:55 dataset_info.json
-rw-r--r-- 1 sliedes sliedes       249 Jul  1 13:55 state.json
```

### Steps to reproduce the bug

Execute the code above.

### Expected behavior

The dataset is loaded successfully.

### Environment info

- `datasets` version: 2.20.0
- Platform: Linux-6.9.3-arch1-1-x86_64-with-glibc2.39
- Python version: 3.12.4
- `huggingface_hub` version: 0.23.4
- PyArrow version: 16.1.0
- Pandas version: 2.2.2
- `fsspec` version: 2024.5.0


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`load_dataset` fails to load dataset saved by `save_to_disk` #7018

Describe the bug

Steps to reproduce the bug

Expected behavior

Environment info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

load_dataset fails to load dataset saved by save_to_disk #7018

Description

Describe the bug

Steps to reproduce the bug

Expected behavior

Environment info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`load_dataset` fails to load dataset saved by `save_to_disk` #7018