Open
Description
Hi @samclarke ,
I have a script to watch multiple robots.txt
from websites but in some case they have none but still display a fallback content. The issue is your library will tell isAllowed() -> true
even if HTML code is passed.
it('should not confirm it can be indexed', async () => {
const body = `<html></html>`;
const robots = robotsParser(robotsUrl, body);
const canBeIndexed = robots.isAllowed(rootUrl);
expect(canBeIndexed).toBeFalsy();
});
(this test will fail, whereas it should pass, or better, it should throw since there are both isDisallowed
and isAllowed
)
Did I miss something to check the robots.txt format?
Does it make sense to throw an error instead of allowing/disallowing something based on nothing?
Thank you,
EDIT: a workaround could be to check if any HTML inside the file... hoping the website does not return another format (JSON, raw...). But it's a bit hacky, no?
EDIT2: a point of view https://stackoverflow.com/a/31598530/3608410
Metadata
Metadata
Assignees
Labels
No labels