Skip to content

Add logic for storing compressed html when scraping HTML #325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 5, 2025

Add logic for storing compressed html when scraping HTML

7c790c4
Select commit
Loading
Failed to load commit list.
Sign in for the full log view
Merged

Add logic for storing compressed html when scraping HTML #325

Add logic for storing compressed html when scraping HTML
7c790c4
Select commit
Loading
Failed to load commit list.
GitHub Actions / flake8 completed Jul 5, 2025 in 2s

reviewdog [flake8] report

reported by reviewdog 🐶

Findings (53)

alembic/versions/2025_07_05_0801-4b0f43f61598_add_url_compressed_html_table.py|23 col 1| Missing docstring in public function
alembic/versions/2025_07_05_0801-4b0f43f61598_add_url_compressed_html_table.py|36 col 1| Missing docstring in public function
src/core/tasks/url/operators/record_type/llm_api/helpers.py|1 col 1| Missing docstring in public module
src/core/tasks/url/operators/url_html/core.py|44 col 1| Missing docstring in public method
src/core/tasks/url/operators/url_html/scraper/parser/mapping.py|1 col 1| Missing docstring in public module
src/db/client/async_.py|7 col 1| 'sqlalchemy.insert' imported but unused
src/db/client/async_.py|2423 col 1| Missing docstring in public method
src/db/client/async_.py|2439 col 1| Missing docstring in public method
src/db/dtos/url/raw_html.py|1 col 1| Missing docstring in public module
src/db/dtos/url/raw_html.py|4 col 1| Missing docstring in public class
src/db/dtos/url/raw_html.py|6 col 14| no newline at end of file
src/db/models/instantiations/url/compressed_html.py|1 col 1| Missing docstring in public module
src/db/models/instantiations/url/compressed_html.py|8 col 1| Missing docstring in public class
src/db/models/instantiations/url/compressed_html.py|21 col 6| no newline at end of file
src/db/models/instantiations/url/core.py|85 col 6| no newline at end of file
src/db/utils/compression.py|1 col 1| Missing docstring in public module
src/db/utils/compression.py|4 col 1| Missing docstring in public function
src/db/utils/compression.py|7 col 1| Missing docstring in public function
src/db/utils/compression.py|8 col 62| no newline at end of file
tests/automated/integration/api/test_duplicates.py|4 col 1| 'src.db.dtos.batch.BatchInfo' imported but unused
tests/automated/integration/db/client/test_delete_url_updated_at.py|1 col 1| Missing docstring in public module
tests/automated/integration/tasks/url/duplicate/constants.py|1 col 1| Missing docstring in public module
tests/automated/integration/tasks/url/duplicate/constants.py|16 col 2| no newline at end of file
tests/automated/integration/tasks/url/html/asserts.py|1 col 1| Missing docstring in public module
tests/automated/integration/tasks/url/html/asserts.py|5 col 1| 'src.db.models.instantiations.url.compressed_html.URLCompressedHTML' imported but unused
tests/automated/integration/tasks/url/html/asserts.py|6 col 1| 'src.db.utils.compression.decompress_html' imported but unused
tests/automated/integration/tasks/url/html/asserts.py|10 col 1| Missing docstring in public function
tests/automated/integration/tasks/url/html/asserts.py|16 col 1| Missing docstring in public function
tests/automated/integration/tasks/url/html/asserts.py|21 col 1| Missing docstring in public function
tests/automated/integration/tasks/url/html/asserts.py|30 col 1| Missing docstring in public function
tests/automated/integration/tasks/url/html/asserts.py|37 col 1| Missing docstring in public function
tests/automated/integration/tasks/url/html/asserts.py|46 col 1| Missing docstring in public function
tests/automated/integration/tasks/url/html/asserts.py|54 col 1| Missing docstring in public function
tests/automated/integration/tasks/url/html/asserts.py|59 col 1| Missing docstring in public function
tests/automated/integration/tasks/url/html/asserts.py|63 col 1| Missing docstring in public function
tests/automated/integration/tasks/url/html/mocks/constants.py|1 col 1| Missing docstring in public module
tests/automated/integration/tasks/url/html/mocks/constants.py|3 col 32| no newline at end of file
tests/automated/integration/tasks/url/html/mocks/methods.py|1 col 1| Missing docstring in public module
tests/automated/integration/tasks/url/html/mocks/methods.py|11 col 1| Missing docstring in public function
tests/automated/integration/tasks/url/html/mocks/methods.py|50 col 1| Missing docstring in public function
tests/automated/integration/tasks/url/html/mocks/methods.py|60 col 1| Missing docstring in public function
tests/automated/integration/tasks/url/html/mocks/methods.py|60 col 37| Unused argument 'url'
tests/automated/integration/tasks/url/html/setup.py|1 col 1| Missing docstring in public module
tests/automated/integration/tasks/url/html/setup.py|12 col 1| Missing docstring in public function
tests/automated/integration/tasks/url/html/setup.py|18 col 1| Missing docstring in public function
tests/automated/integration/tasks/url/html/setup.py|24 col 1| Missing docstring in public function
tests/automated/integration/tasks/url/html/setup.py|31 col 1| Missing docstring in public function
tests/automated/integration/tasks/url/html/test_task.py|1 col 1| Missing docstring in public module
tests/automated/integration/tasks/url/html/test_task.py|13 col 1| Missing docstring in public function
tests/automated/integration/tasks/url/html/test_task.py|29 col 5| too many blank lines (2)
tests/automated/integration/tasks/url/html/test_task.py|42 col 1| blank line at end of file
tests/manual/core/lifecycle/test_ckan_lifecycle.py|1 col 1| Missing docstring in public module
tests/manual/core/lifecycle/test_muckrock_lifecycles.py|1 col 1| Missing docstring in public module

Filtered Findings (0)

Annotations

Check warning on line 23 in alembic/versions/2025_07_05_0801-4b0f43f61598_add_url_compressed_html_table.py

See this annotation in the file changed.

@github-actions github-actions / flake8

[flake8] alembic/versions/2025_07_05_0801-4b0f43f61598_add_url_compressed_html_table.py#L23 <103>

Missing docstring in public function
Raw output
./alembic/versions/2025_07_05_0801-4b0f43f61598_add_url_compressed_html_table.py:23:1: D103 Missing docstring in public function

Check warning on line 36 in alembic/versions/2025_07_05_0801-4b0f43f61598_add_url_compressed_html_table.py

See this annotation in the file changed.

@github-actions github-actions / flake8

[flake8] alembic/versions/2025_07_05_0801-4b0f43f61598_add_url_compressed_html_table.py#L36 <103>

Missing docstring in public function
Raw output
./alembic/versions/2025_07_05_0801-4b0f43f61598_add_url_compressed_html_table.py:36:1: D103 Missing docstring in public function

Check warning on line 1 in src/core/tasks/url/operators/record_type/llm_api/helpers.py

See this annotation in the file changed.

@github-actions github-actions / flake8

[flake8] src/core/tasks/url/operators/record_type/llm_api/helpers.py#L1 <100>

Missing docstring in public module
Raw output
./src/core/tasks/url/operators/record_type/llm_api/helpers.py:1:1: D100 Missing docstring in public module

Check warning on line 44 in src/core/tasks/url/operators/url_html/core.py

See this annotation in the file changed.

@github-actions github-actions / flake8

[flake8] src/core/tasks/url/operators/url_html/core.py#L44 <102>

Missing docstring in public method
Raw output
./src/core/tasks/url/operators/url_html/core.py:44:1: D102 Missing docstring in public method

Check warning on line 1 in src/core/tasks/url/operators/url_html/scraper/parser/mapping.py

See this annotation in the file changed.

@github-actions github-actions / flake8

[flake8] src/core/tasks/url/operators/url_html/scraper/parser/mapping.py#L1 <100>

Missing docstring in public module
Raw output
./src/core/tasks/url/operators/url_html/scraper/parser/mapping.py:1:1: D100 Missing docstring in public module

Check warning on line 7 in src/db/client/async_.py

See this annotation in the file changed.

@github-actions github-actions / flake8

[flake8] src/db/client/async_.py#L7 <401>

'sqlalchemy.insert' imported but unused
Raw output
./src/db/client/async_.py:7:1: F401 'sqlalchemy.insert' imported but unused

Check warning on line 2423 in src/db/client/async_.py

See this annotation in the file changed.

@github-actions github-actions / flake8

[flake8] src/db/client/async_.py#L2423 <102>

Missing docstring in public method
Raw output
./src/db/client/async_.py:2423:1: D102 Missing docstring in public method

Check warning on line 2439 in src/db/client/async_.py

See this annotation in the file changed.

@github-actions github-actions / flake8

[flake8] src/db/client/async_.py#L2439 <102>

Missing docstring in public method
Raw output
./src/db/client/async_.py:2439:1: D102 Missing docstring in public method

Check warning on line 1 in src/db/dtos/url/raw_html.py

See this annotation in the file changed.

@github-actions github-actions / flake8

[flake8] src/db/dtos/url/raw_html.py#L1 <100>

Missing docstring in public module
Raw output
./src/db/dtos/url/raw_html.py:1:1: D100 Missing docstring in public module

Check warning on line 4 in src/db/dtos/url/raw_html.py

See this annotation in the file changed.

@github-actions github-actions / flake8

[flake8] src/db/dtos/url/raw_html.py#L4 <101>

Missing docstring in public class
Raw output
./src/db/dtos/url/raw_html.py:4:1: D101 Missing docstring in public class

Check warning on line 6 in src/db/dtos/url/raw_html.py

See this annotation in the file changed.

@github-actions github-actions / flake8

[flake8] src/db/dtos/url/raw_html.py#L6 <292>

no newline at end of file
Raw output
./src/db/dtos/url/raw_html.py:6:14: W292 no newline at end of file

Check warning on line 1 in src/db/models/instantiations/url/compressed_html.py

See this annotation in the file changed.

@github-actions github-actions / flake8

[flake8] src/db/models/instantiations/url/compressed_html.py#L1 <100>

Missing docstring in public module
Raw output
./src/db/models/instantiations/url/compressed_html.py:1:1: D100 Missing docstring in public module

Check warning on line 8 in src/db/models/instantiations/url/compressed_html.py

See this annotation in the file changed.

@github-actions github-actions / flake8

[flake8] src/db/models/instantiations/url/compressed_html.py#L8 <101>

Missing docstring in public class
Raw output
./src/db/models/instantiations/url/compressed_html.py:8:1: D101 Missing docstring in public class

Check warning on line 21 in src/db/models/instantiations/url/compressed_html.py

See this annotation in the file changed.

@github-actions github-actions / flake8

[flake8] src/db/models/instantiations/url/compressed_html.py#L21 <292>

no newline at end of file
Raw output
./src/db/models/instantiations/url/compressed_html.py:21:6: W292 no newline at end of file

Check warning on line 85 in src/db/models/instantiations/url/core.py

See this annotation in the file changed.

@github-actions github-actions / flake8

[flake8] src/db/models/instantiations/url/core.py#L85 <292>

no newline at end of file
Raw output
./src/db/models/instantiations/url/core.py:85:6: W292 no newline at end of file

Check warning on line 1 in src/db/utils/compression.py

See this annotation in the file changed.

@github-actions github-actions / flake8

[flake8] src/db/utils/compression.py#L1 <100>

Missing docstring in public module
Raw output
./src/db/utils/compression.py:1:1: D100 Missing docstring in public module

Check warning on line 4 in src/db/utils/compression.py

See this annotation in the file changed.

@github-actions github-actions / flake8

[flake8] src/db/utils/compression.py#L4 <103>

Missing docstring in public function
Raw output
./src/db/utils/compression.py:4:1: D103 Missing docstring in public function

Check warning on line 7 in src/db/utils/compression.py

See this annotation in the file changed.

@github-actions github-actions / flake8

[flake8] src/db/utils/compression.py#L7 <103>

Missing docstring in public function
Raw output
./src/db/utils/compression.py:7:1: D103 Missing docstring in public function

Check warning on line 8 in src/db/utils/compression.py

See this annotation in the file changed.

@github-actions github-actions / flake8

[flake8] src/db/utils/compression.py#L8 <292>

no newline at end of file
Raw output
./src/db/utils/compression.py:8:62: W292 no newline at end of file

Check warning on line 4 in tests/automated/integration/api/test_duplicates.py

See this annotation in the file changed.

@github-actions github-actions / flake8

[flake8] tests/automated/integration/api/test_duplicates.py#L4 <401>

'src.db.dtos.batch.BatchInfo' imported but unused
Raw output
./tests/automated/integration/api/test_duplicates.py:4:1: F401 'src.db.dtos.batch.BatchInfo' imported but unused

Check warning on line 1 in tests/automated/integration/db/client/test_delete_url_updated_at.py

See this annotation in the file changed.

@github-actions github-actions / flake8

[flake8] tests/automated/integration/db/client/test_delete_url_updated_at.py#L1 <100>

Missing docstring in public module
Raw output
./tests/automated/integration/db/client/test_delete_url_updated_at.py:1:1: D100 Missing docstring in public module

Check warning on line 1 in tests/automated/integration/tasks/url/duplicate/constants.py

See this annotation in the file changed.

@github-actions github-actions / flake8

[flake8] tests/automated/integration/tasks/url/duplicate/constants.py#L1 <100>

Missing docstring in public module
Raw output
./tests/automated/integration/tasks/url/duplicate/constants.py:1:1: D100 Missing docstring in public module

Check warning on line 16 in tests/automated/integration/tasks/url/duplicate/constants.py

See this annotation in the file changed.

@github-actions github-actions / flake8

[flake8] tests/automated/integration/tasks/url/duplicate/constants.py#L16 <292>

no newline at end of file
Raw output
./tests/automated/integration/tasks/url/duplicate/constants.py:16:2: W292 no newline at end of file

Check warning on line 1 in tests/automated/integration/tasks/url/html/asserts.py

See this annotation in the file changed.

@github-actions github-actions / flake8

[flake8] tests/automated/integration/tasks/url/html/asserts.py#L1 <100>

Missing docstring in public module
Raw output
./tests/automated/integration/tasks/url/html/asserts.py:1:1: D100 Missing docstring in public module

Check warning on line 5 in tests/automated/integration/tasks/url/html/asserts.py

See this annotation in the file changed.

@github-actions github-actions / flake8

[flake8] tests/automated/integration/tasks/url/html/asserts.py#L5 <401>

'src.db.models.instantiations.url.compressed_html.URLCompressedHTML' imported but unused
Raw output
./tests/automated/integration/tasks/url/html/asserts.py:5:1: F401 'src.db.models.instantiations.url.compressed_html.URLCompressedHTML' imported but unused