Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DB seed and tests for coverart in API #248

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

notartom
Copy link
Member

Following up on PR#245, add a DataBase Seeder that uses our scrubbed
database dump
(https://github.com/LibriVox/librivox-ansible/blob/master/roles/db_import/files/librivox_catalog_scrubbed.sql.bz2), and start using it to test coverart in the API.

@notartom notartom force-pushed the db_seed_and_coverart_tests branch 4 times, most recently from 75be88a to 8939e51 Compare December 15, 2024 01:32
notartom added a commit to LibriVox/librivox-ansible that referenced this pull request Dec 15, 2024
@notartom notartom force-pushed the db_seed_and_coverart_tests branch 23 times, most recently from ac6476d to 29d1c97 Compare December 16, 2024 01:52
@notartom notartom force-pushed the db_seed_and_coverart_tests branch 5 times, most recently from a71c265 to 2ee5d29 Compare December 17, 2024 02:17
Following up on PR#245, add a DataBase Seeder that uses our scrubbed
database dump
(https://github.com/LibriVox/librivox-ansible/blob/master/roles/db_import/files/librivox_catalog_scrubbed.sql.bz2), and start using it to test coverart in the API.
@notartom notartom force-pushed the db_seed_and_coverart_tests branch from 2ee5d29 to 200ac9f Compare December 17, 2024 03:00
@notartom notartom changed the title WIP: Add DB seed and tests for coverart in API Add DB seed and tests for coverart in API Dec 17, 2024
@notartom
Copy link
Member Author

@garethsime @redrun45 thoughts?

@garethsime
Copy link
Contributor

I'll take a look tomorrow evening! 🙂

@redrun45
Copy link
Collaborator

redrun45 commented Dec 18, 2024

What I can tell about what should do:
If I'm reading this right, it looks like we could add as many of these DbTestCase sub-classes as we care to. IFF I'm understanding the runtime behavior correctly, each of these test cases will prompt the database to be overwritten with the scrubbed copy again. From there, we can modify the data with endpoint calls, with direct db imports, or by importing additional seeds to modify it, and it all gets reset again for the next test(s).

That sounds great! The import time would add up if we did many, but we could do quite a bit from one class instance.

(As regards this particular test, of the API, I'd add a check to assert that the new elements are not returned unless 'coverart' is specified. But that's a far simpler thing. 😉 )

Trying it out:
I don't think it's running correctly on my dev box. I've pulled the latest 'librivox-ansible', and deployed from that to get the bzip lib. I don't get any error messages, but I also don't get the usual progress indicator ('.....') or test results report ("OK (44 tests, 188 assertions)," on current master). Instead, I get what must be the JSON response to that HTTP query.

I went to do some smoke testing, including putting a bad assert to Author_test.php. If that test is being run, then that final results report and the error message about the bad assert are both being swallowed up behind that JSON. It looks like this new test should be running silently, so I've not figured out what's going on yet.

Copy link
Contributor

@garethsime garethsime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, I like the Seeder idea. I'm having the same problems running the test locally as @redrun45 is though, here's what I get in the way of output:

/librivox/www/librivox.org/catalog# XDEBUG_MODE=coverage ./vendor/bin/phpunit -c application/tests/
PHPUnit 10.5.10 by Sebastian Bergmann and contributors.

Runtime:       PHP 8.1.2-1ubuntu2.20 with Xdebug 3.1.2
Configuration: /librivox/www/librivox.org/catalog/application/tests/phpunit.xml

STARTING DUMP
FINISHING DUMP 38s
{"books":[{"id":"52","title":"Letters of Two Brides","description":"Letters of Two Brides is an epistolary novel. The two brides are Louise de Chaulieu (Madame Gaston) and Ren\u00e9e de Maucombe (Madame l'Estorade). The women became friends during their education at a convent and upon leaving began a life-long correspondence. For a 17 year period, they exchange letters describing their lives.<br \/><br \/>Michelle Crandall reads Renee\u2019s letters, and Kara Shallenberg reads Louise\u2019s. Letters from the men in their lives are read by Peter Yearsley, David Barnes, Denny Sayers, and Sean McKinley","url_text_source":"https:\/\/www.gutenberg.org\/etext\/1941","language":"English","copyright_year":"1902","num_sections":"57","url_rss":"https:\/\/librivox.org\/rss\/52","url_zip_file":"https:\/\/www.archive.org\/download\/letters_brides_0709_librivox\/letters_brides_0709_librivox_64kb_mp3.zip","url_project":"","url_librivox":"https:\/\/librivox.org\/letters-of-two-brides-by-honore-de-balzac\/","url_other":"","totaltime":"9:09:20","totaltimesecs":32960,"authors":[{"id":"86","first_name":"Honor\u00e9 de","last_name":"Balzac","dob":"1799","dod":"1850"}]}]}

(Ignore the STARTING DUMP and FINISHING DUMP, I was just trying to time how long it took to do the reload.)

I had a brief look at what might be going wrong, but it wasn't obvious 🙂 I don't know how available I will be over the Christmas period, but I'll see what I can do in terms of debugging it on my machine if it's all working fine for you @notartom!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if this was going to work for multiple test classes, but the scrubbed data has a DROP TABLE before each section, so it does actually tidy itself up nicely when the seeder reruns.

One downside of this is that, since it uses the same database for testing and local dev, it will obliterate any data that people have manually crafted to use on their machines.

That said, maybe we could combine it with this branch that hits a separate test database. (The application/config/testing/database.php config is what you'd take from that branch, and people would have to have the user/database already set up similar to what the commands in Bootstrap.php are doing, but maybe that's something we can put in a setup script or in the CONTRIBUTING guide.)

@@ -2,4 +2,12 @@

class DbTestCase extends CIPHPUnitTestDbTestCase
{
public static function setUpBeforeClass(): void
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally, my preference is to reset the data between each test rather than between each test class, so that there's no leakage between tests, but it seems like setting up such a large database is quite expensive. (On my humble machine, this takes between 30 to 40s to do the reload.)

I think there's a few options we could explore for speeding this up:

  • Use a reduced-size data set for the tests - Less production-like, but fast and you get some data set up for free
  • Use a schema-only data set for the tests - Even less production like, but you get full control of the data being set up and it should reload really fast
  • Maybe: Benchmark the DatabaseSeeder and see if we can get that to run faster somehow, though it's probably just the data size making it slow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants