-
Notifications
You must be signed in to change notification settings - Fork 0
Hough peak matching prototype support executables
The hough peak matching prototype has a few executables that do not figure directly in the algorithm but are used in testing and support. Some of these are added piecemeal as the need arises, so their interface and function is less stable than those of the regular executables
equivalent_db database_1 database_2
Reads the two given peak database files and reports whether they describe equivalent databases, that is, databases that describe the same real-world information but with file-level differences like changes in line ordering or in object id numbers.
In more formal terms, two databases are equivalent when there is at least one bijection between object keys and ordering of parameters such that corresponding objects under the bijection have an identical representation after the bijection is applied to their foreign keys and the ordering is applied to their parameters. They are not equivalent if and only if there is no such ordering-bijection pair.
There are 3 groups of id numbers: peak_id, sample_id, and peak_group_id. Though integers can be reused between groups, as keys they should be considered distinct. For example sample_id 5 and peak_group_id 5 should be considered distinct keys for the purposes of the bijection.
An exhaustive search in parameters and bijections would be of exponential time complexity. However in practical terms, each item only has a small number (usually 1) potential candidate match in the other database. Thus there is no real problem if some kind of back-tracking search is used.
Interested users can view a first cut at designing an algorithm to implement this definition. There is no guarantee that it is the same algorithm used in the current version, however.
Writes to standard output, prints:
- "Databases ARE equivalent" if the databases are equivalent,
- "Databases ARE NOT equivalent" if the databases are not equivalent
valid_db db_file
Checks whether the given file parses as a valid peak-matching database or not.
Writes to standard output, prints:
- "Valid" if the file parses as a valid database
- "Invalid" if the file does not parse as a valid database
duplicate_peak_match_db [--remove-sample-params] < input > output
Takes a peak-matching database from standard input, parses it and writes an equivalent database to standard output. Used for testing the reading/writing routines.
- --remove-sample-params Removes all sample-params objects before writing the database back out
generateTestData.rb numPeaks numSamples numParams
Writes a test database with all peaks being known and a minimal set of peak groups. Each parameter both in the peaks and samples is drawn from a distribution having unit standard deviation and the peaks themselves are generated with no noise.
NOTE: this application will probably change, and it is very possible that the changes will be delayed in propagating to the web-site. Run generateTestData.rb with no arguments to get the latest usage information.