Skip to content
This repository was archived by the owner on Aug 14, 2021. It is now read-only.

Hough peak matching prototype support executables

RadixSeven edited this page Jun 1, 2011 · 9 revisions

The hough peak matching prototype has a few executables that do not figure directly in the algorithm but are used in testing and support. Some of these are added piecemeal as the need arises, so their interface and function is less stable than those of the regular executables

equivalent_db

Synopsis

equivalent_db database_1 database_2

Description

Reads the two given peak database files and reports whether they describe equivalent databases, that is, databases that describe the same real-world information but with file-level differences like changes in line ordering or in object id numbers.

Semi-formal definition of equivalence

In more formal terms, two databases are equivalent when there is at least one bijection between object keys and ordering of parameters such that corresponding objects under the bijection have an identical representation after the bijection is applied to their foreign keys and the ordering is applied to their parameters. They are not equivalent if and only if there is no such ordering-bijection pair.

There are 3 groups of id numbers: peak_id, sample_id, and peak_group_id. Though integers can be reused between groups, as keys they should be considered distinct. For example sample_id 5 and peak_group_id 5 should be considered distinct keys for the purposes of the bijection.

An exhaustive search in parameters and bijections would be of exponential time complexity. However in practical terms, each item only has a small number (usually 1) potential candidate match in the other database. Thus there is no real problem if some kind of back-tracking search is used.

Interested users can view a first cut at designing an algorithm to implement this definition. There is no guarantee that it is the same algorithm used in the current version, however.

Output

Writes to standard output, prints:

  • "Databases ARE equivalent" if the databases are equivalent,
  • "Databases ARE NOT equivalent" if the databases are not equivalent

valid_db

Synopsis

valid_db db_file

Description

Checks whether the given file parses as a valid peak-matching database or not.

Output

Writes to standard output, prints:

  • "Valid" if the file parses as a valid database
  • "Invalid" if the file does not parse as a valid database

duplicate_peak_match_db

Synopsis

duplicate_peak_match_db [--remove-sample-params] < input > output

Description

Takes a peak-matching database from standard input, parses it and writes an equivalent database to standard output. Used for testing the reading/writing routines.

Options

  • --remove-sample-params Removes all sample-params objects before writing the database back out

generateTestData.rb

Synopsis

generateTestData.rb numPeaks numSamples numParams

Description

Writes a test database with all peaks being known and a minimal set of peak groups. Each parameter both in the peaks and samples is drawn from a distribution having unit standard deviation and the peaks themselves are generated with no noise.

NOTE: this application will probably change, and it is very possible that the changes will be delayed in propagating to the web-site. Run generateTestData.rb with no arguments to get the latest usage information.

Clone this wiki locally