The library should validate the document before processing it

Hi @samclarke ,

I have a script to watch multiple `robots.txt` from websites but in some case they have none but still display a fallback content. The issue is your library will tell `isAllowed() -> true` even if HTML code is passed.

```ts
  it('should not confirm it can be indexed', async () => {
    const body = `<html></html>`;

    const robots = robotsParser(robotsUrl, body);
    const canBeIndexed = robots.isAllowed(rootUrl);

    expect(canBeIndexed).toBeFalsy();
  });
```
_(this test will fail, whereas it should pass, or better, it should throw since there are both `isDisallowed` and `isAllowed`)_

Did I miss something to check the robots.txt format?

Does it make sense to throw an error instead of allowing/disallowing something based on nothing?

Thank you,

EDIT: a workaround could be to check if any HTML inside the file... hoping the website does not return another format (JSON, raw...). But it's a bit hacky, no?

EDIT2: a point of view https://stackoverflow.com/a/31598530/3608410

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The library should validate the document before processing it #34

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

The library should validate the document before processing it #34

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions