Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#13 partial. Added geomath::ang_diff test and benchmark #14

Closed
wants to merge 6 commits into from

Conversation

stonylohr
Copy link
Contributor

Currently uses a static truncated dat file. I hope to improve on that in later pull requests.

The current draft makes geomath public. This is required for testing its operations using integration tests, but wouldn't be required if I created the test as a unit test instead. I think we'll want it to be public eventually, some of my non-CI geomath suggestions (e.g. #10) include breaking changes, so it might be preferable to avoid making geomath public in master until after those breaking changes are either applied or rejected.

… and benchmark, currently using static dat file
Copy link
Member

@michaelkirk michaelkirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @stonylohr - nice to see some proposed code integration!

Currently, this seems a bit more complicated than I was hoping for. Is the format in this .dat something being written by your code or is this a format we've inherited from somewhere?

If it's something you've created, I'd prefer it to be much simpler - e.g. where we don't have to escape/unescape things, parse comments, have separate sections.

Specifically, I was expecting the entire "parser" to be something as simple as:

fn parse(foo) -> Iter<Item=&[f32]> {
 lines.iter().map(|l| l.split(" ")).map(|s| s.parse::<f32>().ok());
}

Otherwise, the geomath_benchmark and test_ang_diff look pretty good to me.

// If both values are nan, consider the difference to be 0.
pub fn calc_delta(x: f64, y: f64) -> (f64, f64) {
if x.is_nan() && y.is_nan() {
return (0f64, 0f64);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with most things NaN, I have no idea if this is reasonable.

Glad to see the introduction of relative eq check. I mostly use it via the approx crate, but I don't think I knew about it when I started working on this crate.

Copy link
Contributor Author

@stonylohr stonylohr Dec 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely want to consider a pair of NaNs to be equal in most cases, even though f64::NAN==f64::NAN returns false, and I'm pretty sure f64::NAN-f64::NAN returns f64::NAN, because we want to consider the C++ and rust results to match if they return the same value, even when that value is NaN.

Similarly, we want the option to consider 0.0 and -0.0 to not match, even though I'm pretty sure the equality operator returns true, and their difference is 0.

The NAN vs -NAN case is sort of mix of the two above. We might sometimes want to let it slide, but we definitely want the option to reject.

I'll make a note to explore the approx crate. It looks pretty promising, but I'm not yet clear whether it can be easily coaxed into the desired handling for all the cases above. If it can, it would be great to retire my custom comparison logic.

Note that I don't plan to explore this in my current pass, unless we actually identify an issue with my current comparison code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for elaborating. I think it's totally fine to keep this snippet here - no need to spend time digging into approx.

@stonylohr
Copy link
Contributor Author

The dat file format is my making. The character escaping is something I could leave out for now. The reason it's there is that some later geographiclib classes use string inputs and outputs that can contain spaces, colons, single and double quotes, etc. But yes, I can drop that logic for now and wait to restore it until we get to things that need it.

I can arrange to drop most dat file comments, and some content, without much issue. Specifically, I now think we'll mostly be relying on the criterion framework for time comparisons, so there's less value in the SYSTEMTAG, LIBRARYTAG, and TESTSTATUS lines than I'd originally been thinking. The same is true for the time section on individual operation lines, and the statistics section. I'll plan to drop all of those.

There should be no issue removing section comments. I'll plan to drop those as well.

This still leaves the operation code line (e.g. Math_AngDiff_x_y_e), and the format comment. While we could infer the operation from the file name, or have the knowledge hard-coded in the tests, I think there's value in keeping it. Similarly, because the line interpretation varies from one operation to another, and there are so many operations, I think there's value in including it in the dat file explicitly, rather than only having it in the test code. I propose that I combine these two items into a single line, which would now always be the first line of the file.

Would that be an acceptable compromise?

@stonylohr
Copy link
Contributor Author

I've also given some more thought to the sections within lines, and I think I'm satisfied that we can drop them. As I mentioned in my previous comment, I think we can drop the time section.

I originally separated the error section from outputs in part to retain the option to report ending reference or state values even in error cases, and in part to avoid ambiguity for functions that return non-numeric outputs. In practice, I'm not reporting any outputs in error cases, and it's probably fine to assume that no non-error case will result in an actual output value of "ERROR". So I think dropping that separation should be ok.

In terms of removing the separation between inputs and outputs, I think that in order to do this safely while also retaining support for a variable number of inputs (as will come up in some later array-based cases), I may need to make the outputs required rather than optional. That shouldn't have any effect on the rust side of things, where those outputs are of central interest, but it may require some more work on the C++ side in my case creation logic. I currently use a 2-stage approach where some special value cases are initially created as input-only lines, but I think I can avoid that.

Anyway, I'll proceed with trying to remove all of the line-level section separators.

As I mentioned in my previous comment, I still favor including a single header line in each data file that specifies the operation code and (as an implicit comment) the line format.

In my current pass, I also don't plan to change formatting of NaN and infinity values to something that rust's f64::parse will accept. The current values use the default C++ standard library formatting. I think it makes the most sense to stick with the C++ convention, since I think the change would be pretty coding-time intensive, and no single convention will match every language's preferred approach. We can revisit that topic after this pass if you wish.

I'm attaching a couple of heavily truncated sample files illustrating a format I think I can get us to at the end of my current pass. I've renamed the files, so github will let me attach them unzipped.

Geodesic_C3f.dat.txt
Math_AngDiff_x_y_e.dat.txt

@michaelkirk
Copy link
Member

Thanks for your response!

The dat file format is my making. The character escaping is something I could leave out for now. The reason it's there is that some later geographiclib classes use string inputs and outputs that can contain spaces, colons, single and double quotes, etc. But yes, I can drop that logic for now and wait to restore it until we get to things that need it.

Yes let's keep it as simple as possible for now.

I can arrange to drop most dat file comments, and some content, without much issue. Specifically, I now think we'll mostly be relying on the criterion framework for time comparisons, so there's less value in the SYSTEMTAG, LIBRARYTAG, and TESTSTATUS lines than I'd originally been thinking. The same is true for the time section on individual operation lines, and the statistics section. I'll plan to drop all of those.

Great - yep I think criterion is a good way to go.

There should be no issue removing section comments. I'll plan to drop those as well.

This still leaves the operation code line (e.g. Math_AngDiff_x_y_e), and the format comment. While we could infer the operation from the file name, or have the knowledge hard-coded in the tests, I think there's value in keeping it. Similarly, because the line interpretation varies from one operation to another, and there are so many operations, I think there's value in including it in the dat file explicitly, rather than only having it in the test code. I propose that I combine these two items into a single line, which would now always be the first line of the file.

I was most nervous about lines like this:

 #STATS-SECTION
 #COUNT_TESTS 300
 #COUNT_USED 300
 #TOTAL_TEST_SECONDS 2.4075000000000033e-05
 #MIN_TEST_SECONDS 6.5999999999999995e-08
 #MAX_TEST_SECONDS 1.1119999999999999e-06
 #MEDIAN_TEST_SECONDS 6.8e-08
 #MEAN_TEST_SECONDS 8.0250000000000113e-08

Which to me looked less like "comments" and more like structured data written and parseable by a machine.

I'm ok with a single comment at the top.

Another thought, if you're feeling bold, we could just do csv/tsv where a top row with field names is a common convention.

@michaelkirk
Copy link
Member

I originally separated the error section from outputs in part to retain the option to report ending reference or state values even in error cases

I'm not sure what this means - what are "error"s in this context? Are we trying to match geographic lib crash for crash? 😆

@stonylohr
Copy link
Contributor Author

I originally separated the error section from outputs in part to retain the option to report ending reference or state values even in error cases

I'm not sure what this means - what are "error"s in this context? Are we trying to match geographic lib crash for crash? laughing

I know it sounds funny, but that's pretty close to the truth. When a geographiclib call errors out, it's not usually because of a coding error -- it's usually because the inputs fall into a category that's explicitly rejected by the geographiclib code. In fact, many of Karney's geographiclib test cases are specifically confirming that a function rejects certain inputs. By default, I think the rust code should use the same behaviors. For both this and other variations, we're likely to encounter some cases where the rust version has a legitimate reason to behave differently from the C++, but I think it will be the exception rather than the rule. We can decide how to address those cases when we encounter them.

@michaelkirk
Copy link
Member

michaelkirk commented Dec 10, 2020 via email

@stonylohr
Copy link
Contributor Author

I'll give the csv/tsv option some more thought. If nothing else, tabs might be less likely to show up in string values than spaces are, when we eventually get to that point. Unfortunately, I don't see how to make the header line a proper csv/tsv header line, in part because the operation code is something that doesn't correspond to any particular item on the later lines, and in part because some items in the header line correspond to a variable number of items on the later lines. More frequently, they refer to a fixed number of items in an abbreviated form, but I'm not excited to bend over backwards for a format change that I know is going to fall apart eventually anyway.

@michaelkirk
Copy link
Member

For now I'd be fine sticking with the .dat as you have it if that's your preference - especially since, as I understand it, you're saying there are other examples where something like a tsv won't work well.

Later if we can find a way to easily use a format which doesn't require reading parser code to understand, that could be a future improvement.

@stonylohr
Copy link
Contributor Author

@michaelkirk I've committed a new set of changes that I think should address your concerns with my previous draft. It also adds a fairly complete set of tests and benchmarks for public Geodesic and geomath functions.

I also tried to simplify the process of getting logged c++ data: After cloning, run "git submodule update --remote". The test and benchmark code should handle everything else.

Let me know if you'd like further adjustments or discussion.

@stonylohr
Copy link
Contributor Author

@michaelkirk Since my earlier draft, I've also started leaning toward switching my tests from being integration tests to unit tests. As I noted previously, their mechanics aren't ideal for being categorized as unit tests, but I'm coming to think that the increased access available to unit tests probably outweighs the "not ideal unit tests" consideration. I'd be curious to hear whether you have any thoughts on the topic in either direction.

Copy link
Member

@michaelkirk michaelkirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @stonylohr - I took a quick look at things. There's a lot of commented out stuff and things that seem incomplete so it's sort of hard for me to evaluate.

src/lib.rs Outdated
@@ -5,7 +5,7 @@ pub mod geodesiccapability;
pub use geodesiccapability as capability;

mod geodesicline;
mod geomath;
pub mod geomath;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By making this public we're committing to some kind of stability for it, which I'd prefer to avoid. Does it need to be public just to benchmark it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Most of the commenting out falls into two categories.

  1. Things that the tests can't access because they're currently structured as integration tests, but that I didn't want to just let fall between the cracks, as I think is likely to happen if I omit things without comment.
  2. Assertions that would fail except that I've either added checks to skip them in certain cases, or loosened their criteria (sometimes dramatically), to avoid failing the existing behavior. This is based on my thought that it's preferable not to add an active test that fails for the current geographiclib-rs code. Each of these loosened tests then comes with a set of todo comments, again to avoid things falling between the cracks.

I agree that it would be preferable to avoid making geomath public right away. Both integration tests and benchmarks are limited to accessing public items, and I really wanted to include geomath in both, because it's such a fundamental building block for other functionality. One piece of good news, by the way, is that geomath stood up quite well.

I have a few thoughts on strategies to address all of the concerns that come up here. I'll finish going through your other comments first, though, so I'm working with the latest information.

// Benchmarks using logged cpp values as inputs

// Note that using Arc and Mutex to access a shared dataset results
// in much lower overhead than cloning the dataset in each iteration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


#[test]
fn test_gen_inverse_azi_vs_cpp() {
// Format: this-in[_a _f] lat1 lon1 lat2 lon2 outmask result=a12 s12-out azi1-out azi2-out m12-out M12-out M21-out S12-out
Copy link
Member

@michaelkirk michaelkirk Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this describing the format of items in the block below?

something like:

let lat1 = items[0];
let lon1 = items[1];

would be more self documenting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only indirectly. Everything after the colon on that line is copy-pasted from the header line of the data file that test is reading from. I thought it might be helpful context, but I could certainly drop it, and/or add intermediate variables that match the Rust parameter names.

}

// #[test]
// fn test_cosd_vs_cpp() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these commented out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are operations that are implemented in the C++ Math class, but are not implemented in Rust geomath. As you may have picked up, I think the ideal goal for geographiclib-rs is to be a feature-complete counterpart to the C++ library, and I think this is an entirely reachable goal. That said, I could collapse those to single-line placeholders, or drop them entirely for now, if you prefer.

let items = utilities::read_consts_basic("Math_consts", 8);
let line_num = 2; // todo: remove once assert_delta stops requiring a line number
assert_delta!(items[0], geomath::DIGITS as f64, 0.0, false, "DIGITS", line_num);
// assert_delta!(items[1], geomath::DIGITS10 as f64, 0.0, false, "DIGITS10", line_num);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these commented out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of those are commented out because they would require access to non-public values in the current Rust implementation, and that's not available to the current integration-test formulation of the test. Some of the constants may also not exist in the Rust version -- I haven't dug to make that distinction.

}
}

// Extract a the first value from each line, and return as a vector.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "extract the first value" you mean get rid of it right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's referring to fetching the first value of each line, and ignoring the rest. Data lines are structured to list inputs first, and then outputs, even if the function uses a more mixed order. For a benchmark, we don't even care what the outputs are, so I use either as_vecs_basic or as_vec_basic to extract just the inputs. I decided there were enough operations that use a single input that it was worth having as_vec_basic for that special case, where it returns a vector of just the first value on each line, rather than a vector of vectors.


// We can't use f64 in the key directly because it implements PartialEq,
// but not Eq or Hash, because of the complication that NaN != NaN.
pub fn get_geodesic_lookup(data: &Vec<Vec<f64>>) -> HashMap<(u64, u64), Geodesic> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't see where this is used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used in many geodesic_benchmark functions, including this example snippet:

fn benchmark_geodesic_gen_direct(c: &mut Criterion) {
    // Format: this-in[_a _f] lat1 lon1 azi1 arcmode s12_a12 outmask result=a12 lat2-out lon2-out azi2-out s12-out m12-out M12-out M21-out S12-out
    let data = utilities::as_vecs_basic("Geodesic_GenDirect", 8);
    let geods = utilities::get_geodesic_lookup(&data);

As I mentioned in my previous comment, each data line starts with the operation's inputs. That also includes any relevant state information. For operations in the context of a geodesic instance, the first two values are typically the a (equatorial radius) and f (flattening) values of the geodesic. This is enough information to unambiguously identify or reconstruct the geodesic. We could simply construct a fresh geodesic for every data line in a benchmark, but that would result in spending a lot of time on something that isn't the operation of interest to the benchmark. Instead, I have such benchmarks start by making a sweep through the data in non-timed code and construct each unique geodesic, and store it in a lookup keyed by the a and f values. I think it usually ends up being 20-ish geodesics. During the timed portion, it can then just get the appropriate geodesic from the lookup, which is much cheaper than constructing a geodesic.

Ok(result)
}

pub fn refresh_unzipped(paths: &DataPathPair) -> std::io::Result<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My instrumented C++ repo currently stores zipped copies of the data files, to cut down on volume. Over time, I've gotten a little less sure whether this zipping is actually very helpful, because I'm pretty sure that git compresses data before sending and storing. It's possible that I just got unlucky with sourceforge hanging up on me the first few times I tried storing the raw data files.

In any event, those zipped data files are pulled from my c++ repo as part of a "submodule" under the test_fixtures folder. refresh_unzipped takes a file base name, looks for the zipped and unzipped copies of the file, compares their timestamps, and does a fresh unzip if the unzipped version either doesn't exist or is older than the zipped version. I can certainly beef up the function-level comments in the utilities lib file to discuss this sort of thing.

let result = g._gen_direct(items[2], items[3], items[4], items[5] != 0.0, items[6], outmask);
// let mut entries = delta_entries.lock().unwrap();

// record_delta(items[8], result.0, line_num, &mut entries[0]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this commented out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in an earlier comment, I watered down the criteria that many of my checks use to decide whether an operation succeeded, to avoid calling any current behavior a fail. Over time, we'll want to look into these, see why they differ, and probably re-tighten the criteria. One complication is that the assertions will only tell us if something gets worse. So I've added a record_delta function and a few related structures that we can use to check the largest absolute and relative difference for each value in an operation, to help tell whether code changes are successfully bringing things more into line. I wasn't sure whether to include it in all cases, and I think settled on showing a single commented-out example. Since then, I've been leaning toward including it for all checks that are currently seeing any differences (there are some that are already perfectly aligned). You would only see its output if you run tests using the "nocapture" option.




// fn unescape_all(s: &String) -> String {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's for later operations that use string arguments that might include spaces. I meant to stash that somewhere before my last commit, and remove it from committed files, but apparently forgot to do that.

@michaelkirk
Copy link
Member

I've also started leaning toward switching my tests from being integration tests to unit tests. As I noted previously, their mechanics aren't ideal for being categorized as unit tests, but I'm coming to think that the increased access available to unit tests probably outweighs the "not ideal unit tests" consideration. I'd be curious to hear whether you have any thoughts on the topic in either direction.

re: unit vs. integration - I'm not really sure what you mean in this context.

I highly value the testing of the higher level public methods and think there's less value in, e.g. testing each individual method in geomath.

The overall "flow" of data in your branch makes sense to me. My biggest concern right now is that you're relying on a instrumented version of geoagraphic lib to generate the test input. I'd much prefer to use Karney's published (and maintained) test data so we're not in a situation down the road where we have to remember how this works and pray the patches still apply. Conversely it'd be relatively easy to download the updated test data from Karney.

@stonylohr
Copy link
Contributor Author

I've also started leaning toward switching my tests from being integration tests to unit tests. As I noted previously, their mechanics aren't ideal for being categorized as unit tests, but I'm coming to think that the increased access available to unit tests probably outweighs the "not ideal unit tests" consideration. I'd be curious to hear whether you have any thoughts on the topic in either direction.

re: unit vs. integration - I'm not really sure what you mean in this context.

I highly value the testing of the higher level public methods and think there's less value in, e.g. testing each individual method in geomath.

The overall "flow" of data in your branch makes sense to me. My biggest concern right now is that you're relying on a instrumented version of geoagraphic lib to generate the test input. I'd much prefer to use Karney's published (and maintained) test data so we're not in a situation down the road where we have to remember how this works and pray the patches still apply. Conversely it'd be relatively easy to download the updated test data from Karney.

For a general discussion of unit testing vs integration testing in Rust, this may be helpful:
https://doc.rust-lang.org/book/ch11-03-test-organization.html

I agree that Karney's dataset is a useful testing tool, but the results in that dataset are specific to WGS84, so can't be relied on to catch bugs in the handling of other ellipsoids. Also, I don't see the GeodTest-100.dat file on the geographiclib data page (https://sourceforge.net/projects/geographiclib/files/testdata/). Depending on how it's created and how the original files are structured, I'm not sure that it can be relied on to test all the categories of cases that the original GeodTest files are designed for. If it's just the first 100 lines, that's fine if Karney's case categories are evenly distributed, but less so if they're clumped.

I also agree that it may eventually make sense to abandon my instrumentation-based tests entirely, once the effort of maintaining them exceeds the value they provide. I think they're likely to be helpful for a while yet, though.

While geomath operations are currently non-public, and likely to see less outside use even if that changes, their C++ counterparts are publicly accessible, and I see them as something that is likely to make sense to expose at some point. I think there is also value in testing these lower level functions, in terms of helping to pinpoint where issues may be originating in the higher level functions. This is value that is likely to fade as geographiclib-rs becomes more stable, and could be dropped at some point in the future, but I think there's a bit of work before we get to that point.

Anyway, I propose the following as a broad strokes version of next steps:

  1. I'll replace my current integration tests with corresponding unit tests, and enable additional checks in them that need access to non-public items. However, I'll mark these tests using #[ignore], so they'll only run if specifically requested.
  2. I'll add unit tests based on Karney's C++ tests, and not mark these as ignored.
  3. I'll add unit tests that use GeodTest-short as their basis, initially marked as ignored.

@stonylohr
Copy link
Contributor Author

I just committed changes addressing many of the items we've discussed, but not all:

  1. Commented out integration test code. Assuming you approve of the direction I'm taking things, I expect to remove those files entirely soon.
  2. Added unit tests corresponding to the now-disabled integration tests. These tests are flagged as ignored, so they only run if specifically requested. With that in place, I've tightened their expectations, so many of them will fail if run. I expect those failures to eventually be resolved through a combination of bug fixes and re-loosening of test criteria, though not as part of this PR.
  3. Added geodesic unit tests corresponding to Karney's test cases found in various versions of geographiclib. Some of these also fail, so I've marked the failing ones as ignore for now as well. Some parts are also commented out because they test operations that don't appear to be present or public in geographiclib-rs (mostly things related to GeodesicLine). I'd prefer to leave sorting that out as a task for after the current PR as well.
  4. Made geomath non-public again. This breaks the geomath benchmarks, so I've commented those out for now. I figured I'd check your preferences before removing those benchmarks entirely.
  5. Moderately expanded comments in my code.
  6. Made some changes to "utilities" code. It compiles, but I haven't tested it yet.
  7. Probably a few other things I'm forgetting...

@stonylohr
Copy link
Contributor Author

I've been thinking about test naming conventions and would be interested in your thoughts on the topic. There are several different pieces I think might have value, and I think that using a consistent ordering to the pieces would be preferable. Being consistent in test (and benchmark) naming conventions will make it easier to run just a subset of test or bench routines. I'll list pieces in the order I would currently favor, which doesn't match the order used in my commit:

  1. "test": This is kind of redundant with the test tag, but may still have some value. I could go either way on including it at all.
  2. The test's group: Examples in my most recent commit include "vs_python" (which I applied to the pre-existing tests, since that's what they appeared to be), "std" (applied to Karney's unit tests), "vs_cpp" for my tests against instrumented c++. Later this weekend, I hope to add some "geodtest" tests that use a GeodTest*.dat file as the basis.
  3. The name of the test's parent module (or something equivalent for integration tests): For example, all tests in geodesic.rs would use "geodesic".
  4. An identifier for the specific test within the categories above: For example, each of the "vs_cpp" tests uses an operation name for this piece, and each of the "std" tests uses an all-lower-case version of the name of the test it's based on.

@stonylohr
Copy link
Contributor Author

I've now added tests based on GeodTest.dat, tagged "ignore". As with other tests, it will take me some time to pin down a good level of tolerance to allow.

…er. Enabled ArcDirect test. Improved test tolerance and logging. Improved test utilities.
@stonylohr
Copy link
Contributor Author

stonylohr commented Jan 15, 2021

@michaelkirk I just made a commit that I think is a good candidate for fully resolving #13. I would be interested in your feedback.

A few highlights include:

  1. I standardized naming for all my new tests and benches, but restored all pre-existing tests and benches to their original names.
  2. I added benches that use the full GeodTest.dat file.
  3. I set all slow and/or failing tests ignored. Currently failing tests should later be un-ignored as the underlying bugs are resolved, but that doesn't fall under add integrated CI testing vs geographiclib C++ implementation #13.
  4. I enabled a previously-commented-out test from java geographiclib for arc-mode direct, but set it ignored because it fails.
  5. I improved my testing infrastructure in the utilities mod, and used it to improve the tolerances used in some tests
  6. I removed the tests folder. I've concluded that I'm happier with all current tests in their current "unit test" formulation in the src folder than with my previous "integration test" formulation in the tests folder.

@stonylohr
Copy link
Contributor Author

I did think of a few remaining rough edges that I'll work on polishing, but the changes should be more contained than other recent commits. I hope to make the changes over this weekend.

@stonylohr
Copy link
Contributor Author

I've now committed the changes I alluded to yesterday.

I think the main point of interest is that I've removed use of a submodule and zip logic to handle the C++ instrumented data files. Instead, created an approach more similar to that used for GeodTest.dat: I've made zip files available for download on my sourceforge wiki page (https://sourceforge.net/u/alephknot/wiki/Home/), and leave it to the user to download and unzip the files to the appropriate location. If a file isn't found, the error message tries to include both the expected data file location, and instructions on how to find zipped files, much like the approach for GeodTest.dat. Recall that all of the affected tests are set ignored, so do not run by default.

Note that I had hoped to use a single zip file for all of my data files using a kind of fakey release, but sourceforge isn't letting me create a release (perhaps because of the project's forked status). The full data file zip is too large to do as a wiki attachment. Instead, I've broken it into a separate zip file for each of the 3 largest class outputs (Geodesic, GeodesicLine, and Math), and a fourth zip file for all others (none of which matter yet to geographiclib-rs).

@stonylohr
Copy link
Contributor Author

While looking around validate-geographiclib and the core geographiclib some more, I realized that geographiclib has a few different categories of tests in different places, and my tests so far missed some. I'll look into adding those as well.

@stonylohr
Copy link
Contributor Author

After a bit of offline discussion, it's become clear that my branch is way more involved than @michaelkirk is looking for. I'll create a new branch and pull request with the much more limited change I now think he'd like. Other items from this branch may be resubmitted at points in the future under separate issues.

@stonylohr stonylohr closed this Jan 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants