Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: tests to run under pytest #891

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

Conversation

stevenh
Copy link

@stevenh stevenh commented Mar 26, 2025

Fix tests to run fully under pytest, this includes:

  • Fixing filenames, removing dots and correct typos
  • Removing init methods, which are not supported by pytest
  • Implement parametrisation so tests can be run individually
  • Added timeouts so tests can't run forever
  • Replacing print and logging with assertions to prevent false successes
  • Removing unused and add missing imports
  • Mark tests with @pytest.mark.asyncio where appropriate
  • Use http constants to avoid magic numbers
  • Add type hints to improve linting and identify issues
  • Use local server for API tests to improve debugging and eliminate docker dependency
  • Call pytest in main to allow running tests from command line
  • Skip broken tests
  • Fix out of date logic and invalid method parameters
  • Re-enable disabled and commented out tests after fixing them
  • Added missing test data
  • Updated tests that depend on altered external css or html structure
  • Automatically skip if tests if API key is not set

If you need to debug a test, which will take time, you will need to comment out the default timeout in pyproject.toml under [tool.pytest.ini_options].

@stevenh
Copy link
Author

stevenh commented Mar 27, 2025

Here's what the tests look like now in VS Code:
image
image

@stevenh stevenh closed this Mar 27, 2025
@stevenh stevenh reopened this Mar 27, 2025
@stevenh stevenh force-pushed the fix/tests branch 22 times, most recently from 48c34ff to 7036b1b Compare March 27, 2025 18:13
@stevenh
Copy link
Author

stevenh commented Mar 28, 2025

Thanks @aravindkarnam I've tried to minimise the differences over the past few days and it's now in a pretty good place.

As per this issue #893 I was going to break it out into separate PRs, which will take quite a bit of time, so if you're happy to review this as a single item that would avoid a lot of overhead which also comes with the challenge that the tests wont pass until all the fixes are merged.

I can obviously update this PR with the details of all the fixes, which would be much quicker.

Let me know the approach you would like me to take?

In the meantime I'll continue to refine, the main things left is to ensuring that all tests are using assertions as some were just doing prints.

@stevenh
Copy link
Author

stevenh commented Mar 28, 2025

Oh as a follow on it would be great to have the tests trigger in CI, happy to look at adding that to. The obvious challenge with the resources needed so some tests might not be possible, but we could have them skipped automatically if GitHub actions is detected.

@stevenh stevenh force-pushed the fix/tests branch 8 times, most recently from 85191ed to 7b4dd98 Compare March 31, 2025 19:34
@stevenh
Copy link
Author

stevenh commented Mar 31, 2025

These changes are now in a reviewable state, with all tests passing locally.

Let me know if you want the separate PR breakdown.

Here's an example of the output from the cli:

========================================= test session starts ==========================================
platform darwin -- Python 3.12.9, pytest-8.3.5, pluggy-1.5.0
rootdir: /Users/steve/code/github.com/unclecode/crawl4ai
configfile: pyproject.toml
plugins: cov-6.0.0, anyio-4.9.0, asyncio-0.25.3, timeout-2.3.1, pytest_httpserver-1.1.2
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=function
timeout: 20.0s
timeout method: signal
timeout func_only: True
collected 509 items

tests/20241401/test_advanced_deep_crawl.py .                                                      [  0%]
tests/20241401/test_async_crawl_with_http_crawler_strategy.py ...                                 [  0%]
tests/20241401/test_async_crawler_strategy.py ...................                                 [  4%]
tests/20241401/test_async_markdown_generator.py ................                                  [  7%]
tests/20241401/test_async_webcrawler.py ..........                                                [  9%]
tests/20241401/test_cache_context.py .                                                            [  9%]
tests/20241401/test_deep_crawl.py ..                                                              [ 10%]
tests/20241401/test_deep_crawl_filters.py ....................................................... [ 21%]
tests/20241401/test_deep_crawl_scorers.py ........................                                [ 25%]
tests/20241401/test_http_crawler_strategy.py ...........                                          [ 27%]
tests/20241401/test_llm_filter.py .                                                               [ 28%]
tests/20241401/test_robot.py ........                                                             [ 29%]
tests/20241401/test_robot_parser.py .                                                             [ 29%]
tests/20241401/test_schema_builder.py ....                                                        [ 30%]
tests/20241401/test_stream.py .                                                                   [ 30%]
tests/20241401/test_stream_dispatch.py .                                                          [ 31%]
tests/async/test_0_4_2_browser_manager.py ......                                                  [ 32%]
tests/async/test_0_4_2_config_params.py .........                                                 [ 33%]
tests/async/test_async_downloader.py .........                                                    [ 35%]
tests/async/test_basic_crawling.py .....                                                          [ 36%]
tests/async/test_caching.py ....                                                                  [ 37%]
tests/async/test_chunking_and_extraction_strategies.py ....                                       [ 38%]
tests/async/test_content_extraction.py ......                                                     [ 39%]
tests/async/test_content_filter_bm25.py ...............                                           [ 42%]
tests/async/test_content_filter_prune.py ............                                             [ 44%]
tests/async/test_content_scraper_strategy.py ..................                                   [ 48%]
tests/async/test_crawler_strategy.py .....                                                        [ 49%]
tests/async/test_database_operations.py .....                                                     [ 50%]
tests/async/test_dispatchers.py ..........                                                        [ 52%]
tests/async/test_edge_cases.py .......                                                            [ 53%]
tests/async/test_error_handling.py ..s.ss                                                         [ 54%]
tests/async/test_evaluation_scraping_methods_performance_configs.py ......................        [ 59%]
tests/async/test_markdown_genertor.py ......                                                      [ 60%]
tests/async/test_parameters_and_options.py s......                                                [ 61%]
tests/async/test_performance.py ..s                                                               [ 62%]
tests/async/test_screenshot.py .....                                                              [ 63%]
tests/cli/test_cli.py ............                                                                [ 65%]
tests/docker/test_config_object.py .                                                              [ 65%]
tests/docker/test_core.py ........                                                                [ 67%]
tests/docker/test_crawl_task.py ssssssss                                                          [ 68%]
tests/docker/test_docker.py ......                                                                [ 70%]
tests/docker/test_dockerclient.py ..                                                              [ 70%]
tests/docker/test_serialization.py ...                                                            [ 71%]
tests/docker/test_server.py ................................ssssssss....                          [ 79%]
tests/docker/test_server_token.py ......................................ssss...                   [ 88%]
tests/hub/test_simple.py .s                                                                       [ 88%]
tests/legacy/test_cli_docs.py .                                                                   [ 89%]
tests/loggers/test_logger.py .                                                                    [ 89%]
tests/test_crawl_result_container.py ..........................................                   [ 97%]
tests/test_llmtxt.py .                                                                            [ 97%]
tests/test_scraping_strategy.py .                                                                 [ 98%]
tests/test_web_crawler.py ..s.......                                                              [100%]

======================== 482 passed, 27 skipped, 1 warning in 818.78s (0:13:38) ========================

@stevenh stevenh marked this pull request as ready for review March 31, 2025 21:11
@stevenh
Copy link
Author

stevenh commented Apr 1, 2025

I've re-reviewed all the changes and identified 50 potential individual PR's, most are relatively simple bug fixes with few that stand out as bit larger in scope / impact:

fix: config serialisation

Fix config serialisation by creating a new Serialisable type and adding missing module imports for ScoringStats and Logger.

This allows the config to be serialised and deserialised correctly.

Add missing initialisation for ScoringStats.

Add missing stats parameter to URLScorer and all its subclasses to ensure that the stats are serialisable.

fix: download handling

Fix the handling of file downloads in AsyncPlaywrightCrawlerStrategy which wasn't waiting for the download to complete before returning, which resulted in race conditions and incomplete or missing downloads.

fix: markdown caching

Fix the caching of markdown field in DB / files which was only storing the single value, which caused failures when using cached results.

Export the markdown field in StringCompatibleMarkdown, so we don't need to use a private field to ensure that the value is serialised correctly.

fix: crawl result handling

Fix the handling of crawl results, which were using inconsistent types. This now uses CrawlResultContainer for all crawl results, unwrapping as needed when performing deep crawls.

This moves CrawlResultContainer into models ensuring it can be imported where needed, avoiding circular imports.

Refactor CrawlResultContainer to subclass CrawlResult to provide type hinting in the single result case and ensure consistent handling of both synchronous and asynchronous results.

fix: BM25Okapi idf calculation

Fix the idf calculation in BM25Okapi to use the correct formula and ensure that the idf is calculated correctly. This prevents missing results when using BM25Okapi caused by zero idf values.

Removed commented out code to improve readability.

fix: links, media and metadata caching

Fix the storage of links, media and metadata to ensure that the correct values are stored and returned. This prevents incorrect results when using the cached results.

Use Field for default values in Media, Links and ScrapingResult pydantic models to prevent invalid results.

fix: test suite

Fix the test suite to ensure that all tests are run and validation, using asserts, is correctly performed.

Parameterise test so that individual tests can be run from either cli or IDE.

Standardise the main wrapper to allow calling directly using python including passing pytest flags.

Use local server where applicable to ensure test validation and avoid external dependencies ,such as docker, which improves test speed and the ability to debug issues.

Add type hints to improve linting validation and IDE support.

Re-enable tests which were previously disabled due to failures, which have now been fixed.

Use constants from httpx.codes for status codes to avoid magic numbers and improve comprehension.

Limit long running tests to avoid excessive run times.

All tests are now runnable using pytest.


The question is, should I split or not?

Happy to do that if that will help get all the fixes in, but obviously raising 50 individual PR's is a decent undertaking so would be great to confirm first.

@unclecode
Copy link
Owner

@stevenh Wow ** 10! Amazing, really appreciate such collaboration. Tbh this is a kind of support means a lot for any open source library. We really need help on testing and approaching us to a table release.

Regarding splits, I suggest we split them into 3 PRs:

  1. Core & Domain Fixes

    • Group fixes for BM25 idf calculation, config serialization, crawl result handling, and any core model refactoring (like the CrawlResultContainer update).
    • Tests to focus on here would be those in tests/async/test_content_filter_bm25.py and tests/test_crawl_result_container.py.
  2. Caching & Download Improvements

    • Separate out fixes for markdown caching and handling of downloads in AsyncPlaywrightCrawlerStrategy, plus links/media caching improvements.
    • Related tests might include tests/20241401/test_async_markdown_generator.py and tests/20241401/test_http_crawler_strategy.py.
  3. Test Suite Enhancements & CI Setup

    • Isolate changes that modernize and standardize the test suite, such as converting prints to assertions, parameterizing tests, and ensuring local server usage.
    • This can also cover Docker-specific tests (from tests/docker/*) and CLI improvements in tests/cli/test_cli.py.

Splitting this way gives meaningful, cohesive sets of changes, easing review and rollback if needed, while covering distinct areas that have minimal interdependencies.

I think this is good enough of specialisations, more than this is unnecessarily.

Again thx a million and appreciate it. We are building a small group of collaborators, to be part of the main circle I am trying to build for Crawl4ai, you are welcome to join. Let me know if this is something you are interested, then we will talk more about it. Anyway happy to have members like you in our community.

@stevenh
Copy link
Author

stevenh commented Apr 8, 2025

Thanks @unclecode I'll look to get the breakdown done soon, as there's already some conflicts creeping in.

Would be happy to join the team, if you'll have me, so lets sync on that when you have some time.

stevenh added 3 commits April 10, 2025 14:58
Fix the handling of file downloads in AsyncPlaywrightCrawlerStrategy
which wasn't waiting for the download to complete before returning,
which resulted in race conditions and incomplete or missing downloads.

Fix incorrect field use for headers and user_agent in
AsyncPlaywrightCrawlerStrategy.

Fix the type used for the logger in AsyncPlaywrightCrawlerStrategy to
the base class Logger. This allows the logger to be used correctly
and prevents type errors when using the logger.

Add missing abstract methods to AsyncCrawlerStrategy and implement in
its subclasses.

Fix the extraction of the error message when processing console error
log in log_console.

Fix the handling of js_code in AsyncPlaywrightCrawlerStrategy to
ensure that the code is executed correctly and the result is returned
when the script starts with a variable or constant declaration.

Remove unused time variables and commented out code to improve
readability.

Remove duplicate Exception handling.

Fix the handling of raw requests in AsyncHTTPCrawlerStrategy which was
previously using the parsed path corrupting the request.

Fix the browser manager to ensure that it always returns a valid
browser instance. This eliminates the class level playwright instance
which wasn't being cleaned up on close, resulting in subsequent callers
receiving a closed instance and causing errors.

Fix the storage of links, media and metadata to ensure that the
correct values are stored and returned. This prevents incorrect results
when using the cached results.

Use Field for default values in Media, Links and ScrapingResult pydantic
models to prevent invalid results.

Handle undefined string passed to MediaItem width.

Add parameters to the CrawlResult.__init__ method to provide type
hints for the fields, which provides better type checking.

Fix database query retry handling for the case where the database table
is missing when using the SQLite database. This prevents the crawler
from failing when the table is not present and allows it to continue
crawling.

This prevents failing tests after one which drops the database table.

Remove commit call for query's which don't need it.

Sync table definition in legacy code so prevent mixed legacy and new
code tests from conflicting with each other.

Fix the caching of markdown field in DB / files which was only storing
the single value, which caused failures when using cached results.

Export the markdown field in StringCompatibleMarkdown, so we don't need
to use a private field to ensure that the value is serialised correctly.

Correctly initialise BrowserConfig and CrawlerRunConfig from kwargs to
ensure legacy parameters work as expected.
Add relevant tests for the following:
- markdown generator
- http crawler strategy
Add ruff settings to pyproject.toml to prevent deprecation warnings
beyond our control and to disable formatting for now to avoid the noise
it would generate.
@stevenh
Copy link
Author

stevenh commented Apr 10, 2025

The first of the three PR is up: #969.

I ended up expanding the scope to include few other dependent fixes, in particular the one to the browser manager, which was causing test failures due to use of a previously closed browser, see comments on that PR.

@stevenh
Copy link
Author

stevenh commented Apr 11, 2025

All PR's are now ready @unclecode, they are based off each other so need to be reviewed and merged in order.

  1. fix: caching, downloads, js_code and browser manager #969
  2. fix: core api including config and result handling #970
  3. fix: tests to run under pytest #891

I've left the second two in draft in case changes are needed on the first.

The second two are individual commits, so if you want to get a sense for them you can just look at the last commit on each.

@stevenh stevenh changed the title fix: tests and associated bugs fix: tests to run under pytest Apr 11, 2025
Fix a large number of issues with the core including config
serialisation and crawl result handling, which were all identified
during work to ensure that all the existing tests pass consistently.

This includes:
-------
fix: config serialisation

Fix config serialisation by creating a new Serialisable type and adding
missing module imports for ScoringStats and Logger.

This allows the config to be serialised and deserialised correctly.

Add missing initialisation for ScoringStats.

Add missing stats parameter to URLScorer and all its subclasses to
ensure that the stats are serialisable.
-------
fix: parameter definitions, type hints and defaults

Fix parameter definitions by adding missing Optional hit to those which
default to None.

Add type hints to improve linting validation and IDE support.

Set default values for parameters which were missing them, ensuring
type safety and preventing runtime errors.
-------
fix: various comment typos

Fix various typos in comments and doc strings to improve clarity.
-------
fix: BaseDispatcher missing abstract methods

Add missing abstract methods to BaseDispatcher and implement in
subclasses.
-------
fix: crawl result handling

Fix the handling of crawl results, which were using inconsistent types.
This now uses CrawlResultContainer for all crawl results, unwrapping as
needed when performing deep crawls.

This moves CrawlResultContainer into models ensuring it can be imported
where needed, avoiding circular imports.

Refactor CrawlResultContainer to subclass CrawlResult to provide type
hinting in the single result case and ensure consistent handling of
both synchronous and asynchronous results.
-------
feat: implement run_urls_stream for SemaphoreDispatcher

Implement run_urls_stream for SemaphoreDispatcher to allow streaming
of URLs to be crawled.
-------
chore: translate non english comments

Translate non english comments to english to improve readability and
maintainability for non native speakers.
-------
chore: additional example for arun

Add examples for arun to demonstrate usage of streamed and batch
processing modes, clarifying the impact of deep crawl on the results.
-------
fix: handling of CrawlerRunConfig

Fix the handling of CrawlerRunConfig to ensure that the config is
correctly initialised from legacy kwargs.
-------
fix: invalid screenshot argument to aprocess_html

Fix failure caused by generating a screenshot due to unsupported
argument to aprocess_html.
-------
fix: structured content extraction

Fix structured content extraction not being run when NoExtractionStrategy
is used. This ensures that the content is extracted correctly and
returned in the crawl result.
-------
fix: aclear_cache

Fix aclear_cache, previously it was just clean up, which just closed
active connections. It now calls aclear_db to remove all entries from
the cache table.
-------
fix: unused imports

Remove unused imports to improve readability and reduce clutter.
-------
fix: undefined filter_conf

Fix use of undefined filter_conf in crawl_cmd when the config is not
passed in and neither markdown-fix nor md-fit are specified as output
options.
-------
fix: bs4 imports

Fix bs4 imports to use the correct module name and ensure that the
correct classes are imported. This addresses linter errors.
-------
fix: BM25Okapi idf calculation

Fix the idf calculation in BM25Okapi to use the correct formula and
ensure that the idf is calculated correctly. This prevents missing
results when using BM25Okapi caused by zero idf values.

Removed commented out code to improve readability.

Eliminate unnecessary tag_weight calculation when score is 0.
-------
fix: relevant content filter

Fix the extraction of headers from the response, previously this only
handled h1 tags, this now handles all header tags.

Fix boundary check for the relevant content filter to ensure that the
content is not excluded when the end aligns with 150 character limit.
-------
fix: return type for extract_text_chunks

Fix the return type for extract_text_chunks to be include the right types
and values, so consumers know what to expect.
-------
fix: invalid markdown parameter to ScrapingResult

Remove all references to markdown for ScrapingResult as the source never
returns markdown, so its pointless to include it in the result.
-------
fix: closest parent description

Clean unnecessary white space the description returned by
find_closest_parent_with_useful_text so that the two different
strategies return consistent results.
-------
fix: potential sources for element extraction

Fix the potential sources for element extraction to ensure that the
srcset and data-lazy-src are processed instead of srcssetdata-lazy-src
due to missing comma.
-------
fix: data validation in web scraping

Add missing data validation, ensuring that the correct types and only
set values are processed.
-------
fix: missing json import for AmazonProductCrawler

Add missing json import for AmazonProductCrawler.
-------
fix: GoogleSearchCrawler checks and errors

Correct the result field used to report the error message when arun
fails. This ensures that the error message is correctly reported.

Add missing check for result.js_execution_result before use.

Add missing check for blank cleaned_html.

Add validation of cleaned_html to ensure that the value is correctly
converted from bytes to str.
-------
fix: abstract method definitions

Fix the abstract method definitions to ensure that the correct types are
used for async generators and iterators. This lint errors for
incompatible overrides by their implementers. This is cause be the
type being adjusted when a yield is present.
-------
fix: invalid use of infinity for int values

Fix the use of infinity for int values flagged by the linter. Instead
we use -1 to indicate that the value is not set.

Fix use of unset url parameter, reported by linter.
-------
chore: remove unneeded loop on batch results

Remove unnecessary duplicate loop on batch results in _arun_batch to
calculate _pages_crawled.
-------
fix: invalid use of lambda to define async function

Replace the use of lambda to define an async function with a normal
function definition.
-------
fix: validate use of deep_crawl_strategy before use

Validate the type of deep_crawl_strategy before use to ensure that the
correct type is used and preventing lint errors on methods calls using
it.
-------
fix: unprocessed FilterChain tasks

Ensure that all tasks in the FilterChain are either processed and
returned or cancelled if not needed, preventing runtime warnings.
-------
feat: add support for transport to Crawl4aiDockerClient

Add the ability to provide a custom transport to Crawl4aiDockerClient
which allows easy testing.

Set base_url on the httpx.AsyncClient to avoid need for local variable
and calculations, simplifying the code.
-------
fix: Crawl4aiDockerClient.crawl results on error

Correctly handle the async data returned by the crawl method in
Crawl4aiDockerClient when the stream results in a non 200 response.
-------
fix: linter errors for lxml imports

Fix linter errors for lxml imports by using importing the required
methods directly.
-------
fix: use of unset llm_config

Fix the use of unset llm_config in generate_schema to ensure that we
don't get a runtime error.
-------
fix: missing meta field for hub crawlers

Support both ways that meta data is defined by hub crawlers.
-------
fix: correct return type for get_cached_url

Correct the type and number of returned values for get_cached_url.
-------
fix: DocsManager generation

Fix the generation performed by DocsManager now docs have a hierarchical
structure.
-------
fix: linting errors in calculate_semaphore_count

Ensure we set a default for the resource values returned by os methods.
-------
fix: port handling in get_base_domain

Maintain the port for non default ports in get_base_domain to ensure
that the correct domain is returned. This prevents local links being
incorrectly classified as external.
-------
fix: get_title method type

Add static method decorator to get_title method to ensure that the
method is correctly identified as a static method and not an instance
method. This prevents lint errors and ensures that the method is
correctly called.
-------
chore: add project settings to improve developer experience

Add details to pyproject.toml to improve the developer experience
including:
* Configuring pytest test timeouts, ignoring external warnings and
    asyncio scope.
* Disabling ruff formatting
* Creating developer package targets: dev, docker and test
* Leverage chaining to simplify maintenance of the all group
-------
fix: WebScrapingStrategy ascrap

Fix the ascrap command by calling scrap instead of _scrap which misses
a lot of the functionality.
-------
fix: scraping local sites

Fix the scraping of local sites by removing the check for dot in parsed
network location.
-------
fix: monitor support on pseudo terminals

Fix monitor support on pseudo terminals by setting the default for
enable_ui to the value of sys.stdin.isatty(), this ensures it works
under pytest if not specifically set.
-------
fix: imports

Add missing and remove unused imports as well as eliminating the use of
wildcard imports.
-------
fix: dependencies

Add missing optional dependencies.
-------
chore: remove deprecated licence tags

Update setup.py and pyproject.toml to remove deprecated license tags,
bumping the dependency to support the new method. The eliminates toml lint
warning in pyproject.toml.
-------
fix: delay on browser close

Wait for spawned browser process, preventing it from becoming defunct
and delaying the clean up until max retries have been reached.
-------
fix: invalid use of firefox via cdp

Remove firefox from the list of supported internal browsers as it
doesn't support CDP via playwright.
-------
feat: add support for ephemeral debug port

Add support for using an ephemeral debug port for the browser,
eliminating the need to set a specific port and preventing port
conflicts.
-------
core: add TODOs for issues noticed

Add TODOs for issues noticed during implementation of these fixes but
not needed to address test failures of the code.
Fix tests to run fully under pytest, this includes:
- Fixing filenames, removing dots and correct typos
- Removing __init__ methods, which are not supported by pytest
- Implement parametrisation so tests can be run individually
- Added timeouts so tests can't run forever
- Replacing print and logging with assertions to prevent false successes
- Removing unused and add missing imports
- Mark tests with @pytest.mark.asyncio where appropriate
- Use http constants to avoid magic numbers
- Add type hints to improve linting and identify issues
- Use local server for API tests to improve debugging and eliminate
  docker dependency
- Call pytest in __main__ to allow running tests from command line
- Skip broken tests
- Fix out of date logic and invalid method parameters
- Re-enable disabled and commented out tests after fixing them
- Added missing test data
- Updated tests that depend on altered external css or html structure
- Automatically skip if tests if API key is not set

If you need to debug a test, which will take time, you will need to
comment out the default timeout in pyproject.toml under
[tool.pytest.ini_options].
@stevenh
Copy link
Author

stevenh commented Apr 11, 2025

Latest set of results:

platform darwin -- Python 3.12.9, pytest-8.3.5, pluggy-1.5.0
rootdir: /Users/steve/code/github.com/unclecode/crawl4ai
configfile: pyproject.toml
plugins: cov-6.0.0, anyio-4.9.0, asyncio-0.25.3, timeout-2.3.1, pytest_httpserver-1.1.2, aiohttp-1.1.0
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=function
collected 552 items

tests/20241401/test_advanced_deep_crawl.py .                             [  0%]
tests/20241401/test_async_crawl_with_http_crawler_strategy.py ...        [  0%]
tests/20241401/test_async_crawler_strategy.py ...................        [  4%]
tests/20241401/test_async_markdown_generator.py ................         [  7%]
tests/20241401/test_async_webcrawler.py ..........                       [  8%]
tests/20241401/test_cache_context.py .                                   [  9%]
tests/20241401/test_deep_crawl.py ..                                     [  9%]
tests/20241401/test_deep_crawl_filters.py .............................. [ 14%]
.........................                                                [ 19%]
tests/20241401/test_deep_crawl_scorers.py .......................        [ 23%]
tests/20241401/test_http_crawler_strategy.py ...........                 [ 25%]
tests/20241401/test_llm_filter.py .                                      [ 25%]
tests/20241401/test_robot.py ........                                    [ 27%]
tests/20241401/test_robot_parser.py .                                    [ 27%]
tests/20241401/test_schema_builder.py .s.s                               [ 28%]
tests/20241401/test_stream.py .                                          [ 28%]
tests/20241401/test_stream_dispatch.py .                                 [ 28%]
tests/async/test_0_4_2_browser_manager.py ......                         [ 29%]
tests/async/test_0_4_2_config_params.py .........                        [ 31%]
tests/async/test_async_downloader.py .........                           [ 32%]
tests/async/test_basic_crawling.py .....                                 [ 33%]
tests/async/test_caching.py ....                                         [ 34%]
tests/async/test_chunking_and_extraction_strategies.py ....              [ 35%]
tests/async/test_content_extraction.py ......                            [ 36%]
tests/async/test_content_filter_bm25.py ...............                  [ 38%]
tests/async/test_content_filter_prune.py ............                    [ 41%]
tests/async/test_content_scraper_strategy.py ..................          [ 44%]
tests/async/test_crawler_strategy.py .....                               [ 45%]
tests/async/test_database_operations.py .....                            [ 46%]
tests/async/test_dispatchers.py ....s....s                               [ 48%]
tests/async/test_edge_cases.py .......                                   [ 49%]
tests/async/test_error_handling.py ..s.ss                                [ 50%]
tests/async/test_evaluation_scraping_methods_performance_configs.py .... [ 51%]
..................                                                       [ 54%]
tests/async/test_markdown_genertor.py ......                             [ 55%]
tests/async/test_parameters_and_options.py s......                       [ 56%]
tests/async/test_performance.py ..s                                      [ 57%]
tests/async/test_screenshot.py .....                                     [ 58%]
tests/browser/docker/test_docker_browser.py ..s..sss....s                [ 60%]
tests/browser/test_browser_manager.py ....                               [ 61%]
tests/browser/test_builtin_browser.py ..........                         [ 63%]
tests/browser/test_builtin_strategy.py ..                                [ 63%]
tests/browser/test_cdp_strategy.py ...                                   [ 63%]
tests/browser/test_launch_standalone.py s                                [ 64%]
tests/browser/test_parallel_crawling.py ...                              [ 64%]
tests/browser/test_playwright_strategy.py ....                           [ 65%]
tests/browser/test_profiles.py ..                                        [ 65%]
tests/cli/test_cli.py ............                                       [ 67%]
tests/docker/test_config_object.py .                                     [ 68%]
tests/docker/test_core.py ........                                       [ 69%]
tests/docker/test_crawl_task.py ssssssss                                 [ 71%]
tests/docker/test_docker.py ......                                       [ 72%]
tests/docker/test_dockerclient.py ..                                     [ 72%]
tests/docker/test_serialization.py ...                                   [ 73%]
tests/docker/test_server.py ................................ssssssss.... [ 80%]
                                                                         [ 80%]
tests/docker/test_server_token.py ...................................... [ 87%]
ssss...                                                                  [ 89%]
tests/hub/test_simple.py .s                                              [ 89%]
tests/legacy/test_cli_docs.py .                                          [ 89%]
tests/loggers/test_logger.py .                                           [ 89%]
tests/memory/test_crawler_monitor.py .                                   [ 90%]
tests/memory/test_dispatcher_stress.py .                                 [ 90%]
tests/test_crawl_result_container.py ................................... [ 96%]
.......                                                                  [ 97%]
tests/test_llmtxt.py .                                                   [ 98%]
tests/test_scraping_strategy.py .                                        [ 98%]
tests/test_web_crawler.py ..s.......                                     [100%]

================= 515 passed, 37 skipped in 942.35s (0:15:42) ==================
Finished running tests!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants