Root relative path infinite loop

I am making a separate issue for this because while it is related to [this](https://github.com/spider-rs/spider/issues/246) issue, it is for the currently implemented root-relative only traversal mechanism.

Python web server for testing:
`malweb.py`:
```
from flask import Flask

app = Flask(__name__)

@app.route("/", methods=["GET"])
def root():
    return f"""
        <a href="/catch_all_root_relative/1/">Go deeper</a>
    """

@app.route("/catch_all_root_relative/<path:text>", methods=["GET"])
def catch_all_root_relative(text=None):
    count = sum([int(x) for x in text if x == "1"]) + 1
    text = "/catch_all_root_relative" + ("/1" * count) + "/"
    return f"""
        <a href="{text}">Go deeper</a>
    """
```
`python3 -m flask --app malweb run`

Spider:
```
extern crate spider;

use spider::tokio;
use spider::website::Website;

#[tokio::main]
async fn main() {
    let mut website: Website = Website::new("http://127.0.0.1:5000/")
    .build()
    .unwrap();
    let mut rx2 = website.subscribe(16).unwrap();

    tokio::spawn(async move {
        while let Ok(res) = rx2.recv().await {
            println!("{:?}", res.get_url());
        }
    });

    website.crawl_smart().await;

    println!("Links found {:?}", website.get_size().await);
}
```

The infinite recursion problem is a problem for both root-relative and base-relative URLs. Spider should handle this accordingly by keeping track of the link depth and stop crawling once the link depth is reached. There should be detection of this behavior rather than the current depth process that only goes based on the number of path segments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Root relative path infinite loop #247

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Root relative path infinite loop #247

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions