-
Notifications
You must be signed in to change notification settings - Fork 24
G. Synthesis of Steps 1–4
This multi-sheet Excel workbook (100142_G6147_SBM_LinReg_SQUIDvCalamari.xls) contains the results of a direct comparison between a SQUID workbook and the corresponding Calamari reports (SQUID_01 to SQUID_04 inclusive). The focus here is to prove that Calamari replicates the SQUID calculations as faithfully as possible. The relevant parameters are:
- Example XML file = 100142…
- SBM normalised = YES
- Ratio Calculation Method = LinReg throughout
- Calamari version = 1.1.0
- SQUID version = 2.50.11.02.03 June 2015 (i.e. with double-deadtime bug fixed)
Four different tab-colours have been used in the XLS to explain the different data-sources:
Bright yellow = worksheets generated directly by SQUID, and which have not been altered. On the StandardData and SampleData sheets, the traditional SQUID column-header cells (row 6) have been coloured to give an idea of overall progress (columns replicated vs columns not yet addressed): black text on pale grey background = calculations replicated in SQUID_02, white text on black background = calculations replicated in SQUID_04.
Aqua-green (‘SQ_...Array’) = 2 worksheets (CondensedXMLArray and WithinSpotArray) which are SQUID output, but with rows and columns manipulated so the calculated data forms a rectangular matrix without gaps or repeated headers. These manipulations were to facilitate comparisons with the Calamari reports (which all comprise rectangular matrices without gaps). In both cases, the corresponding unaltered SQUID-sheet is immediately to the left of the green-coloured one.
Hot pink/magenta (‘Calamari_...) = Calamari output, Copy-Pasted from the report CSVs. On each sheet, the row immediately above that containing the column-headers (“source”) denotes, for each column, the Calamari CSV report the column was sourced from. The sheet should be compared to the sheet immediately to its left. There has been some manipulation of the pasted Calamari data to match the column-order to that defined by the corresponding SQUID-sheet (and drop MinIndex data from SQUID_04, as it has no place in this specific comparison exercise), but no alteration of numeric data.
Dark blue (‘…_Diff’) = Cell-by-cell comparisons of the SQUID(-derived) worksheet and the corresponding Calamari worksheet, which are in each case, the two sheets immediately to the left. The cell-by-cell comparison takes the simple form [SQUID value minus Calamari value], hereafter called “difference value”, as per the formula in each cell. On each sheet, the row immediately above that containing the column-headers (“source”) denotes, for each column, the Calamari CSV report the column was sourced from. Above that, “max” and “min” denote the maximum and minimum values of the “difference value” for that columns, and at extreme left (top of column B), the “maximum of maximums” and “minimum of minimums” have been collated.
There is systematic variation in “difference values” from column to column, according to the magnitude of the data values (difference values might be as high as 1E-10 where data values are 1E+6, and as low as 1E-20 where data values are as small as 1E-4).
Bad agreement is evident in isolated cells within column 204/206 %err on ‘StdData_Diff’ and ‘SamData_Diff’. The affected cells are coloured green. The issue here is that SQUID imposes a ceiling of 9999 on %err values when writing them to its StandardData and SampleData sheets: unfortunately, this ceiling is imposed as part of a cell-formatting subroutine (associated with writing the worksheet, rather than performing arithmetic), and Calamari has not yet traversed any cell-formatting subroutines. Calamari calculates and reports the true %err value, which is sometimes much larger than 9999. This is not a bug in either software package: we know what is going on, we just haven’t yet addressed it.
Visual inspection shows that even the biggest apparent differences between SQUID 2.50 and Calamari (disregarding 204/206 %err as explained above), are almost entirely confined to the last decimal place of the output. About 99% of the time, the Calamari value is marginally smaller than the SQUID value (i.e. the “difference value” is positive): this is because importing the Calamari output into Excel truncates the values at 15 significant figures (the double-precision values generated in Java generally have 16 or 17 significant figures). Future iterations of this comparative exercise will ensure that SQUID 2.50 and Calamari output are both rounded to 15 (or fewer) significant figures.
But other sources of discrepancy exist inside Excel. We have been able to replicate SQUID’s VBA SpotAv calculations perfectly (via Excel spreadsheet) to 15 significant figures, but not SQUID’s VBA LinReg (and of course, this file showcases a LinReg comparison). The full detail of the VBA vs Excel worksheet comparison is shown in the (separate) detailed-single-analysis file 100142_OG1.7_SpotAvg_LinReg.xls, at the base of sheet ‘LinReg_CorrelErrors’. We have not been able to account for the intra-Excel difference in LinReg values, although no coding error has yet been identified (and when such errors exist, our experience thus far suggests they tend to affect sequential calculations to a far greater degree than just the last decimal place).