Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add integration testsuite #1678

Open
wants to merge 18 commits into
base: master
Choose a base branch
from
Open

Add integration testsuite #1678

wants to merge 18 commits into from

Conversation

ehuss
Copy link
Contributor

@ehuss ehuss commented Dec 30, 2022

This adds an integration testsuite for testing triagebot's behavior. There are two parts of the testsuite:

  • github_client tests the GithubClient by sending requests to a local server and checking the response.
  • server_test actually launches the triagebot executable, sets up some servers, launches PostgreSQL, and injects webhooks, and checks for the response.

An overview of the changes here:

  • Some of the networking side was changed to be able to use a different host other than api.github.com, raw.githubusercontent.com, and the teams API.
  • The constructor for GithubClient was simplified a little since it was always used in the same way.
  • Added some features to help with testing, such as recording webhook events.

The comments and docs should give some overview of how things are set up and how to write and run tests.

@ehuss
Copy link
Contributor Author

ehuss commented Dec 30, 2022

This is a little incomplete as I wanted to get some feedback to see if this is something you would be willing to consider accepting. This only includes a few tests, so it doesn't cover very much. Some questions to consider:

  • Is this worth it? I'm a bit on the fence, but I think it would make working on triagebot a bit easier, at least to have some more confidence when making changes.
  • Requiring the existence of PostgreSQL is a pain. I'm wondering if there is some alternative?
  • I hacked the Dockerfile to get it to support postgres. I expect it needs some cleanup.
    • Adding postgresql to the Docker image adds a fair amount of size (185MB?). It might be worth considering changing this so that the testing image is separate from the build image.
  • I could split the two kinds of tests into two separate integration tests, but I'm not sure if it is worth the hassle (particularly with sharing the common code).

Copy link
Member

@Mark-Simulacrum Mark-Simulacrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the high-level feedback is that the overall direction is good, I think we need to improve how the JSON files are kept and generated. One thought I had is that it seems like we ought to be able to support something like --bless to generate them automatically? Doing so on CI (with perhaps a tarball artifact produced or some other easy-ish way of using results locally) would make iterating a little better.

Otherwise I'm worried that the bar for adding tests is pretty high if you need to figure out gh api and such.

README.md Outdated
> --url=http://127.0.0.1:8000/github-hook --secret somelongsekrit
> ```
>
> Where the value in `--secret` is the secret value you place in `GITHUB_WEBHOOK_SECRET` described below, and `--repo` is the repo you want to test against.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what permissions are necessary for this? It seems like it might be a convenient way to test against actual GitHub from CI (too), but it's not clear whether the permissions model really makes that easy.

For CI at least we could probably set up our own forwarding if needed though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, the default token generated by gh auth login will have full permissions. gh webhook just needs the ability to alter webhook settings on the repo, which that token should have. At least in my testing, it "just worked" without me needing to do anything special (other than signing up for the beta).

I'm not sure it would really be feasible to set it up on CI. I think it would need to start by creating a bot account, and manually creating a token with a limited scope to a single "test" repo, and add that token to the workflow secrets. But then I think PR authors would then have full write access to that repo, which could be dangerous.

I think it would be challenging to do in a secure way, but I haven't thought about it much. I don't know when it will be out of beta, either.

"PUT" => Method::PUT,
"DELETE" => Method::DELETE,
"PATCH" => Method::PATCH,
_ => panic!("unexpected HTTP method {s}"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW it seems plausible that https://docs.rs/http/latest/http/method/struct.Method.html might work instead of this (and some other manually defined structs here). Though historically I've not been super happy with the http crate API for this kind of thing... so maybe hand rolling is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I didn't use it is because:

  • I wanted an enum to make it easy to use Method::*. http uses associated consts.
  • I also wanted it to impl Copy to make it easy to use.
  • I had a custom STOP verb, which is not standard, and the http Method extensions weren't easy to use. I could easily switch that to use a different method, though.

I understand that it might be nicer to not use a custom http server, but I suspect adding anything else would have a pretty large build time and complexity hit.

I could trim a few things down, like removing custom header support. It isn't being used here, and I'm not sure it will ever be needed (I copied this from Cargo which does use it). However, it only adds a few lines so I figured it didn't hurt to keep it.

@@ -0,0 +1,543 @@
//! `GithubClient` tests.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an improvement over status quo, but I am worried about the amount of JSON we're checking in here.

I have a few thoughts on alternatives:

  • Store the JSON elsewhere (at minimum: a separate directory would help to avoid losing track of the .rs files)
    • S3 seems like a possible candidate but is annoying to update, etc.
  • Don't try to make it possible to run this offline (basically: require CI or a GITHUB_TOKEN -- and "just" query the github API).
    • The dynamic nature of replies seems like it makes this quite annoying to do well.

Ultimately I'm not opposed to just checking in a bunch of JSON but I've always felt a little icky about that kind of thing. I'm not sure there are good alternatives though.

(I guess the "big" option is getting rid of our own client and using octocrab... which IIRC might be auto-generated at least in part? But that doesn't really solve the problem, just shifts it elsewhere).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, the JSON files are a mess. I'll look into organizing it. I think I would prefer to avoid keeping them external, since I think that would make it more difficult to manage.

I'm not sure how feasible it is to make it run online (particularly on CI). Some of the CI complications are authentication and handling concurrent jobs. I also wanted to ensure this is as easy as possible for contributors to run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about this off and on, and I'm feeling like the server tests are going to be a pain to maintain. They aren't too difficult to create, but if someone modifies some behavior in triagebot that perturbs them in any way, they are going to need to figure out how to recreate the tests which I think is just going to be too painful.

I think I'm going to need to figure out a different strategy.

If I dropped the server tests from this PR, and left just the GithubClient and database tests, would you be more willing to merge it? I think those should be much more maintainable, and at least provide some increase in testing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general -- happy to experiment, the main "blocker" for this PR has (from my perspective) been that you feel happy with it -- I would like to understand the tests as well, and feel OK updating them, but the main thing is that you seem good since we can always just not run them / delete them.

So reducing scope seems fine.

//! At the end of the test, you should call `ctx.events.assert_eq()` to
//! validate that the correct HTTP actions were actually performed by
//! triagebot. If you are uncertain about what to put in there, just start
//! with an empty list, and the error will tell you what to add.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In broad strokes this seems great -- I really like the assertions -- but I am (similarly to the github client) worried about maintaining and updating the JSON files.

@Mark-Simulacrum
Copy link
Member

Is this worth it? I'm a bit on the fence, but I think it would make working on triagebot a bit easier, at least to have some more confidence when making changes.

Do you have a sense of how hard (time/complexity) adding new tests will be? Would it be reasonable to expect drive-by contributors to figure out the test framework, or will maintainers (e.g., me, you, etc.) need to write tests? Either seems workable, but it'll help gauge the level of complexity we can afford.

Requiring the existence of PostgreSQL is a pain. I'm wondering if there is some alternative?

I agree it's a pain -- most triagebot features don't actually rely on the database, so I wonder if we could stub it out. Separately I'd personally expect that having an sqlite backend wouldn't actually be that hard (perf.rlo does something like this), but in production Postgres is much easier to manage (mostly because we can have an "instance" rather than worrying about the files, and can login to fix the database up concurrently with the service).

@apiraino
Copy link
Contributor

apiraino commented Dec 30, 2022

I've used cargo insta for recording and comparing json replies from API endpoints, it's quite useful and easy to use. Could it be an alternate way to handle these JSON files?

@ehuss
Copy link
Contributor Author

ehuss commented Dec 30, 2022

Do you have a sense of how hard (time/complexity) adding new tests will be? Would it be reasonable to expect drive-by contributors to figure out the test framework, or will maintainers (e.g., me, you, etc.) need to write tests? Either seems workable, but it'll help gauge the level of complexity we can afford.

The GithubClient tests were relatively easy. Each one took just a few minutes to write.

The server tests were quite a bit more challenging, and took maybe 10 minutes to write each one (and that's for an easy one). So I do fear it might be difficult for people unfamiliar with it to contribute easily.

most triagebot features don't actually rely on the database, so I wonder if we could stub it out.

I'll look into that, or maybe something like sqlite. I was reluctant to bring in another dependency. I would like to avoid some of the complexity here.

One thought I had is that it seems like we ought to be able to support something like --bless to generate them automatically?

I've used cargo insta for recording and comparing json replies from API endpoints, it's quite useful and easy to use. Could it be an alternate way to handle these JSON files?

I'll look into a more automated way to record the behavior, and possibly using something like cargo insta to manage it. I especially didn't like the current structure where you have to both write an endpoint (.api_handler(…)) and then verify that those endpoints were hit (ctx.events.assert_eq(…)) since it essentially duplicates all that. I think it is going to be challenging to make that completely automated, but it might be worth it.

@ehuss ehuss force-pushed the gh-test branch 2 times, most recently from dd97c88 to 1617f0d Compare February 5, 2023 22:02
.configure(self)
.build()
.unwrap(),
)
.await?;
rate_resp.error_for_status_ref()?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this call to error_for_status_ref() is a small unrelated fix to fail early if the /rate_limit URL fails. Otherwise this was ignoring the status code, which can cause a confusing error.

@ehuss
Copy link
Contributor Author

ehuss commented Feb 5, 2023

OK, I pushed a new version that takes a different approach of just using recorded playbacks. It is substantially easier to write tests, though I'm a little concerned about the possible fragility.

The PR is a little large, so let me recap the changes:

  • Lots of small changes scattered throughout to ensure that the remote URLs are configurable.
  • Lots of small changes to GithubClient to make it easier to record requests.
  • Added src/test_record.rs. These are functions for recording test data as JSON and saving them to disk when the TRIAGEBOT_TEST_RECORD environment variable is set.
  • There are two main styles of tests, tests/github_client/mod.rs which just focuses on GithubClient testing and is much lighter weight and tests/server_test/mod.rs which focuses on webhook tests and is much heavier weight. See those respective files for an introduction on how to write those tests.
    • I stuck the recorded JSON files in a separate directory for each test.
  • Both of those tests are compiled in the same integration test called tests/testsuite.rs.
  • The HTTP requests are recorded as JSON in a structure called an Activity. When the test runs, the test framework cycles through the activities in order, injecting webhooks, and waiting for the expected HTTP request to come in.

I wanted to do a snapshot at this point and see what you think of this new approach.

I'm not sure how reliable these tests are going to be. If in the long run they end up being too fussy, then I think they can be removed. However, I should be around in case there are issues or people need help.

Copy link
Contributor

@apiraino apiraino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ehuss left a few comments, I think nothing but nits.

I would agree with @Mark-Simulacrum to investigate using a sqlite DB for the use case of testing.

src/main.rs Outdated
Comment on lines 245 to 247
if env::var_os("TRIAGEBOT_TEST_DISABLE_JOBS").is_some() {
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of TRIAGEBOT_TEST_DISABLE_JOBS is explained in ./tests/server_test/mod.rs. Perhaps also write here the same bit of documentation? I would also add a short note in the README.md.

Also, would it change anything if the env var check stays outside of the task spawning block (since tasks are disabled at all when running tests)? Would that spare some cycles or change anything in the compiled code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I shuffled the spawning code to separate functions to hopefully make the structure a little clearer (the run_server function was starting to get a little long in my opinion).

I'd rather not mention TRIAGEBOT_TEST_DISABLE_JOBS in README.md, since users should never need to know about it specifically. I did add some documentation to server_test/mod.rs discussing scheduled jobs.

///
/// Returns `None` if recording is disabled.
fn record_dir() -> Option<PathBuf> {
let Some(test_record) = std::env::var_os("TRIAGEBOT_TEST_RECORD") else { return None };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that record_dir() is only checked in the initial main(). Could checking for TRIAGEBOT_TEST_RECORD be moved there instead of checking it in init() and record_event()? Would that help it being a bit more visible?

Also I would mention it in the README.md (for increased awareness when someone wants to run tests)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely clear on this suggestion. Each time an event comes in, it needs to record it to disk. In order for it to know where to record it, it needs to read the TRIAGEBOT_TEST_RECORD. It can't be checked in one place. Unless this is perhaps suggesting to store the value somewhere and read that directly? It seems like that would make it a bit more difficult to use since there would need to be some global state tracked somewhere.

I added a brief mention of recording in README.md

Comment on lines 146 to 148
// TODO: This is a poor way to choose a TCP port, as it could already
// be in use by something else.
let triagebot_port = NEXT_TCP_PORT.fetch_add(1, std::sync::atomic::Ordering::SeqCst);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it test if the port is already taken?

Another approach could be like Wiremock does: allocate a pool of free ports at startup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'm clear on the suggestion about setting up a pool. Each triagebot process is independent, and wouldn't be able to easily "take" a port from a pool (sending file descriptors over sockets exists, but not something I would want to bother with).

I'm not too worried about using up a lot of ports. I would only expect for there to ever be a few dozen tests, and there should be over 15,000 ports available starting at 50,000.

I added some code to pick a free port, though it may not be very reliable.

@@ -4,7 +4,8 @@ use rust_team_data::v1::{Teams, ZulipMapping, BASE_URL};
use serde::de::DeserializeOwned;

async fn by_url<T: DeserializeOwned>(client: &GithubClient, path: &str) -> anyhow::Result<T> {
let url = format!("{}{}", BASE_URL, path);
let base = std::env::var("TEAMS_API_URL").unwrap_or(BASE_URL.to_string());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mention TEAMS_API_URL in the README with purpose and a suggested default value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you say what should be mentioned? These environment variables (GITHUB_API_URL, GITHUB_RAW_URL, etc.) are intended only for testing, and should only ever be set automatically. Users shouldn't ever need to know about them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, documenting for developers the env vars needed to run the triagebot (new one: TEAMS_API_URL). I only half agree but probably just unimportant nits, I guess :-)

This helps keep the `run_server` function a little smaller, and makes
the disabling code a little clearer. This also adds a check if test
recording is enabled, to help avoid jobs interfering with recording.
@ehuss
Copy link
Contributor Author

ehuss commented Feb 6, 2023

I'd be happy to investigate adding sqlite support. What do you think about using something like sqlx instead of recreating the pooling and migration stuff? It seems like it already implements everything that would be needed. Or would you prefer to just copy the code over from rustc-perf?

@apiraino
Copy link
Contributor

apiraino commented Feb 6, 2023

What do you think about using something like sqlx

I am not familiar with how rustc-perf handles sqlite DB connections but in my experience sqlx is a hefty dependency to pull in (in terms of compile time and maintenance chores). (fwiw) past comments from Mark point at keeping external dependencies at the minimum for the triagebot. Just my .2 cents.

@ehuss
Copy link
Contributor Author

ehuss commented Feb 6, 2023

Oh yea, I strongly agree with keeping deps at a minimum. I haven't used sqlx, and I was mostly curious about it, since it supports async, handles pooling, and migrations among other things (and seems well maintained and often recommended). Copying from rustc-perf means pulling in the pooling code and the migration code, and needing to write every DB query twice. It's not huge to copy in, so I can take a crack to see how difficult it is.

@Mark-Simulacrum
Copy link
Member

Mark-Simulacrum commented Feb 6, 2023

I have no direct objections to sqlx, but when I wrote the rustc-perf logic I believe they encouraged having a database around for schema checking or similar at build time which I really didn't want. It may have also been the case that they had a stale version of rusqlite in dependencies, and we needed something newer?

In part I also believe I ended up feeling like the pooling and other abstraction layer was less work than learning sqlx and then writing atop that some amount of abstraction (since there were still incompatibilities between SQLite and postgres backends, so just raw sqlx wasn't enough).

It's quite possible these problems have been solved since then.

Not all API responses are JSON, which can make it a bit awkward to
differentiate which kind of responses are JSON. This unifies the
two Activity variants so that there is only one, which simplifies
things a bit. Non-JSON responses are stored as a JSON string, under
the assumption that GitHub never response with a JSON string.
@ehuss
Copy link
Contributor Author

ehuss commented Feb 21, 2023

I have pushed an update that adds sqlite support. I essentially copied the code from rustc-perf. There is a Connection trait which abstracts the backends. The postgres side should be essentially the same. The tests will use postgres if it is available, but will use sqlite if not. There's also coverage tests for the Connection API which exercises both postgres and sqlite.

The way sqlite handles concurrency and locking is a bit different from Postgres, but for testing it should be fine.

This can help avoid accidentally deleting something that hasn't been
checked in yet.
Apparently some environments roundtrip datetimes in Postgres
with different precision.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants