Skip to content

Refactor: Reorganize run_task and unit tests into dedicated directories #1159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 36 commits into
base: main
Choose a base branch
from

Conversation

DeborahOlaboye
Copy link

@DeborahOlaboye DeborahOlaboye commented Jun 23, 2025

This pull request addresses issue #591 by reorganizing the test structure to improve clarity and maintainability.

Changes:

  • Moved all tests related to run_task into a new tests/task/ directory.
  • Moved unit tests to the tests/unit/ directory.
  • Verified all tests pass using task test to ensure functionality remains unchanged.

Closes #591

@DeborahOlaboye DeborahOlaboye requested review from a team as code owners June 23, 2025 20:23
@DeborahOlaboye DeborahOlaboye marked this pull request as draft June 25, 2025 06:00
@DeborahOlaboye DeborahOlaboye marked this pull request as ready for review June 26, 2025 18:11
@DeborahOlaboye DeborahOlaboye changed the title Refactor: Migrate all the run_task tests into a separate folder (#591) Refactor: Migrate all the run_task tests into a separate folder Jun 27, 2025
@DeborahOlaboye DeborahOlaboye changed the title Refactor: Migrate all the run_task tests into a separate folder Refactor: Reorganize run_task and unit tests into dedicated directories Jun 27, 2025
@DeborahOlaboye
Copy link
Author

Hi @gregtatum, just wanted to see if you’ve had a chance to look at this. I'd be willing to make any changes needed. Thank you.

Copy link
Member

@gregtatum gregtatum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking much nicer! I'm requesting changes for the following:

  1. pathlib inconsistencies, and the parent[1] pattern being confusing.
  2. pyproject.toml exclude list needs updating for the type suppressions.
  3. Dependency changes need to be reverted

Plus there were a few other smaller things I commented on. This is getting close! Thanks for the work on it.

@@ -1,2 +1,4 @@
sacrebleu[ja,ko]==2.4.2
unbabel-comet==2.2.2
numpy==1.26.4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a refactor for file paths should need to change any dependencies here. I know our dependency situation can be brittle across different machines. We pretty much require that everything run in docker, due to things breaking outside of docker.

task docker will start up docker. If you can try on main to run that and verify that the tests run locally for you without changing any dependencies. If it's still failing within docker then please file an issue and we'll need to fix that first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically these files should be unchanged:

 pipeline/eval/requirements/eval.in
 pipeline/eval/requirements/eval.txt
 pipeline/translate/requirements/translate-ctranslate2.in
 pipeline/translate/requirements/translate-ctranslate2.txt
 poetry.lock
 pyproject.toml <- revert dependency changes, keep other changes

pyproject.toml Outdated
@@ -57,7 +57,7 @@ hanzidentifier = "1.2.0"
psutil= "6.0.0"

[tool.poetry.group.utils-docker.dependencies]
PyICU = "2.8.1"
PyICU = "^2.11"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it looks like someone is updating PyICU again. We'll probably want to update it, but that should be a different PR. You are welcome to open another one with this change, but it's best to keep PRs as small as possible.

pyproject.toml Outdated
PyICU = "^2.11"


[tool.poetry.group.dev.dependencies]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably can be reverted as well.

@@ -120,6 +126,7 @@ build-backend = "poetry.core.masonry.api"

[tool.pytest.ini_options]
testpaths = ["tests"]
pythonpath = ["."]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a reasonable change to me.

@@ -120,6 +126,7 @@ build-backend = "poetry.core.masonry.api"

[tool.pytest.ini_options]
testpaths = ["tests"]
pythonpath = ["."]
markers = [
# Run tests outside of docker:
# task test -- -m "not docker_amd64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't comment on the files themselves, but below there are:

 tests/data/tests/en-ca-teacher-1.npz
 tests/data/tests/en-ca-vocab.spm

I think these don't need to be added to this PR, and should be removed. This should be a refactor so we'll not need more files.


en_fake_translated = "\n".join([line.upper() for line in ru_sample.split("\n")])
ru_fake_translated = "\n".join([line.upper() for line in en_sample.split("\n")])

current_folder = os.path.dirname(os.path.abspath(__file__))
fixtures_path = os.path.join(current_folder, "fixtures")
root_path = os.path.abspath(os.path.join(current_folder, ".."))
fixtures_path = (Path(__file__).resolve().parents[1] / "fixtures").as_posix()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't mix the pathlib and os.path utilities here. I would prefer one or the other.

The simplest would be to make this work without pathlib. I generally prefer pathlib, but that does mean changing more lines of code here.

@@ -106,12 +107,12 @@ def run_eval_test(params) -> None:
}

if comet == "skipped":
env["COMET_SKIP"] = "1"
env["COMET_SKIP"] = "1" # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, the exclude paths need updating in the pyproject.toml. Rather than fixing files in a big PR, it's better to split them into smaller PRs.


pytestmark = [pytest.mark.docker_amd64]

current_folder = os.path.dirname(os.path.abspath(__file__))
fixtures_path = os.path.join(current_folder, "fixtures")
fixtures_path = (Path(__file__).resolve().parents[1] / "fixtures").as_posix()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here with my previous comments on pathlib. I'll stop commenting on this, but if you can look through the rest of the PR for anything else where my feedback would apply.

@@ -19,15 +20,15 @@


def validate_alignments(corpus_path, vocab_src_path, vocab_trg_path):
sp_src = spm.SentencePieceProcessor(model_file=vocab_src_path)
sp_trg = spm.SentencePieceProcessor(model_file=vocab_trg_path)
sp_src = spm.SentencePieceProcessor(model_file=vocab_src_path) # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, I'll stop commenting on the type suppressions, but the exclude list should be fixed instead.



@patch(
"translations_parser.cli.taskcluster.get_args",
return_value=argparse.Namespace(
input_file=Path(__file__).parent / "data" / "taskcluster.log",
input_file=Path(__file__).resolve().parents[1] / "data" / "taskcluster.log",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To expand on my earlier comments around pathlib, I find this style to be really confusing where you are indexing into the array of parents.

I would prefer something simpler like:

Path(__file__).parent / "../data/taskcluster.log"

@gregtatum
Copy link
Member

Oh and when you need me to re-review this, please hit the little circle with arrows next to my name, and it will show up on my review queue. If you are pushing up without changes ready for review, just make sure I'm not flagged for review.

Here is where the circle arrow thing is:

image

@DeborahOlaboye DeborahOlaboye requested a review from gregtatum July 2, 2025 12:18
@DeborahOlaboye
Copy link
Author

Hi @gregtatum,

I've made the requested corrections and pushed the updates. However, two tests are currently failing:

tests/unit/test_tracking_cli.py::test_experiments_marian_1_10

tests/unit/test_tracking_cli.py::test_experiments_marian_1_12

The logs indicate assertion errors, but the exact details of the failures are not fully exposed in the test summary.

From the logs, it seems the failure might be linked to how experiments Marian versions 1.10 and 1.12 are parsed, possibly affected by how the changes interact with log parsing or configuration file access in the Taskcluster context.

I'm currently investigating this further, but if you can confirm whether this failure is expected due to a known issue with these versions, I’d appreciate your input.

Thank you once again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate all the run_task tests into a separate folder
2 participants