Redundent data from HTML source are included

### Question

I have simple script that used docling to parse this webpage:

https://ramzinex.com/help/register-in-ramzinex

The issue is when I parse the document it contains `ol` tag and footer details as well. How can I exclude them?


The `ol` tag info: 
    صرافی رمزینکس
    راهنما
    ثبت نام و احراز هویت

The `footer` info:

<img width="1460" height="428" alt="Image" src="https://github.com/user-attachments/assets/81fc7438-d65d-4c09-b228-6d40484f3967" />

This whole section will be included in final document.
Also, when I use hybrid chunker to chunk them, these are still included.

Is there any config to exclude redundant stuff? from links, PDFs or anything else?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Redundent data from HTML source are included #1930

Question

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Redundent data from HTML source are included #1930

Description

Question

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions