We should define all the rules and add a custom validation logic before relying on lxml `DocumentInvalid`.