Skip to content

Byte limit #75

Open
Open
@JanPetterMG

Description

@JanPetterMG

Feature request: Limit the maximum number of bytes to parse.

A maximum file size may be enforced per crawler. Content which is after the maximum file size may be ignored. Google currently enforces a size limit of 500 kilobytes (KB).

Source: Google

When forming the robots.txt file, you should keep in mind that the robot places a reasonable limit on its size. If the file size exceeds 32 KB, the robot assumes it allows everything

Source: Yandex

  • Default limit of X bytes, eg. 524.288 bytes (512KB / 0.5MB)
  • User-defined limit override
  • Make sure the limit is reasonable, throw an exception if dangerously low, eg. 24.576 bytes (24 KB)
  • Should be able to disable - no limit

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions