-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Structured Typesetting (STS) generation #9
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you add the makefile? The readme instructions note several ways to compile metamath, including gcc m*.c -o metamath, and using automake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metamath.c line 3032 duplicated code from line 2393
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style, readability command line option STS: Norm used always verbose forms, consider expanding it to something like STRUCTURED_TYPESETTING or so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
built-in help does not cover new option
Thank you Wolf for your review!
It actually does, see changes in
That's very verbose, though. Maybe just "STRUCTURED" ?
Sure, I can remove the makefile. |
How do you render this "structured typesetting"? Is this MathML, MathJax or something like this ? For texts, and in particular help texts, Norm used the double-space convention between sentences, and I admit I like that. |
Yes, the
Your wish has come true with the last commit! |
mmhlpb.c
Outdated
H("Syntax: VERIFY STS <format>"); | ||
H(""); | ||
H("This command error-checks that the STS rules definition covers all syntax"); | ||
H("defined in the Metamath source file loaded. It runs through all non"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double space here ! And I would write "non-definitional" with a hyphen, or even "nondefinitional", since "non" is not a word.
Thanks. So maybe you can use the option "/ MATHML" ? I'm fine with "/ STRUCTURED" too. I cannot approve this PR since I am not fluent in C :-( |
The option STS or /STRUCTURED is the second alternative to /HTML and /ALT_HTML. Are you able to roughly tell in what way these options differ by just looking at the name? If not, people will have to guess at what the result of invocations are. Do you expect this option to be typed by users of metamath directly? Or will it be likely be part of a script rarely changing? If you type the option frequently, a short name is handy, otherwise even a very long option name will hardly annoy anyone, but is easily understandable to casual readers of the script. I am still busy with Xmas, have just skimmed your PR. I currently cannot look into this further. |
I wanted to keep it generic because one could generate anything with it, not just MathML. |
Yes, and it is itself followed by the name of the production to use. A command would typically be: If you had, say, a
Yes, that's roughly the process. So if I follow you, we could use e.g. /STS_HTML as an option tag? |
What about EXPAND_HTML? Or HTML_EXPAND? |
When running metamath, it is enough to type any non-ambiguous prefix, so here, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.gitignore: OK, see here (https://stackoverflow.com/questions/6626136/best-practice-for-adding-gitignore-to-repo)
Maybe one should put this info into the README?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metamath.c line 60 suggest a version number and add a changelog entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming consistency mmwtex.h line 33: stsFlag (and possibly following) should be prefixed with g_ (g_stsFlag, matching g_altHtmlFlag for example), see changelog 0.187 metamath.c, line 88
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optimization metamath.c line 2381: The function switchPos("/ STS") is called three times more or less in succession. Consider evaluating the result once and use a local variable to recall the result within the if conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parsetSTSRules type: why parset...? Looks like a typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meamath.c line 2977: Suspicious value of i. The other options / TIME or / NO_VERSIONING add 2 to the number of options, seemingly counting the slash and the tag as different entries. Why is this not done for / STS? If the count is correct, IMO a comment should clarify the underlying logic.
to be continued...
@wlammen I'm going to copy your remarks directly into a code review, it creates a sub-thread per remark and I think it's much easier to follow like that. |
Ok, I've updated the history and proposed a version 0.199. |
.gitignore
Outdated
@@ -0,0 +1,2 @@ | |||
*.o | |||
metamath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wlammen writes:
OK, see here (https://stackoverflow.com/questions/6626136/best-practice-for-adding-gitignore-to-repo)
Maybe one should put this info into the README?
I'm not sure which info you would like to add to the README.
There is a standard .gitignore file for C projects here, we could give it a try.
Maybe better in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why I added this comment is, because the metamath sources are distributed via a tar archive as well. Obviously you wont need the gitignore in this case. I think, some kind of distribution how-to is finally called for. That are my thoughts. I was curious why you added the file. Separate PR is fine with me.
mmwtex.h
Outdated
/* 19-Jul-2017 tar Added for STS/MathML output */ | ||
extern flag stsFlag; /* STS output (for "structural typesetting") */ | ||
extern vstring stsOutput; /* output mode chosen for STS (follows STS flag) */ | ||
extern vstring postProcess; /* command to pipe the output into (used for MathJax prerendering) */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wlammen wrote:
stsFlag (and possibly following) should be prefixed with g_ (g_stsFlag, matching g_altHtmlFlag for example), see changelog 0.187 metamath.c, line 88
Yes, this global variable naming convention has been introduced after I programmed the STS module. As an improvement, I could retrofit this to follow it too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the current rules.
if (switchPos("/ ALT_HTML") != 0 || switchPos("/ STS") != 0 ) { | ||
print2("?Please specify only one of / HTML , / ALT_HTML and / STS.\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wlammen wrote:
optimization: The function switchPos("/ STS") is called three times more or less in succession. Consider evaluating the result once and use a local variable to recall the result within the if conditions.
I don't think this is called 3 times. There are 3 else..if branches, and it is called one time in each, to ensure only one HTML output formatting option is chosen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. It still looks awkward since the call and the parameter is coded several times.
mmwsts.h
Outdated
#include "mmdata.h" | ||
|
||
/* Parse an STS file */ | ||
int parsetSTSRules(vstring format); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wlammen wrote:
parsetSTSRules type: why parset...? Looks like a typo
Yes, it's a typo. I'll fix that!
metamath.c
Outdated
/* 7-Jul-2017 added MathML/STS */ | ||
if (switchPos("/ STS")) i = i + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wlammen wrote:
Suspicious value of i. The other options / TIME or / NO_VERSIONING add 2 to the number of options, seemingly counting the slash and the tag as different entries. Why is this not done for / STS? If the count is correct, IMO a comment should clarify the underlying logic.
Interesting. So, a typical SHOW STATEMENT
command looks like this:
SHOW STATEMENT syl / HTML
That would be g_rawArgs = 5, which already includes 2 arguments for the / HTML
.
As you correctly guessed, for / NO_VERSIONING
and / TIME
, there are two more arguments, therefore the +2.
In the case of STS, the / STS
would already be accounted for in the 5 (taking the place of the / HTML
), so that's not why the +1 is for. Rather, in the case of STS, there is one more argument, namely the name of the output processing, e.g. mathml
. That is what the +1 is for.
I'll add a comment to make that clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am subject to a learning curve as well while reviewing, wrt both to your source code and the review style.
if (lastArgMatches("STS")) { | ||
i++; | ||
if (strlen(stsOutput)) { | ||
if (!getFullArg(i,cat("* Using which output mode <",stsOutput,">? ",NULL))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect indentation
print2("?No source file has been read in. Use READ first.\n"); | ||
goto pclbad; | ||
} | ||
if (strlen(stsOutput)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicated code from line 586 (with i == 2)
mmcmds.c
Outdated
@@ -371,7 +373,9 @@ void typeStatement(long showStmt, | |||
} | |||
htmlDistinctVarsCommaFlag = 1; | |||
let(&str2, ""); | |||
str2 = tokenToTex(g_MathToken[nmbrTmpPtr2[k]].tokenName, showStmt); | |||
/* 27 Jul 2017 tar For MathML/STS */ | |||
if(stsFlag) str2 = stsToken(nmbrTmpPtr2[k], showStmt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicated code from line 357
mmcmds.c
Outdated
/* tokenToTex allocates str2; we must deallocate it */ | ||
let(&str1, cat(str1, " ", str2, NULL)); | ||
let(&str2, ""); | ||
str2 = tokenToTex(g_MathToken[nmbrTmpPtr2[k]].tokenName, showStmt); | ||
/* 27 Jul 2017 tar For MathML/STS */ | ||
if(stsFlag) str2 = stsToken(nmbrTmpPtr2[k], showStmt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicated code from line 377
mmdata.c
Outdated
long i; | ||
long hash = 0; | ||
i = -1; | ||
while (i < 13 && s[i] != -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suspicious i in first loop: -1, which is at least out of bounds for salt, likely for s, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll answer this one now as it is interesting, useful for the rest of the review... and definitely deserves more information in the comments!
Actually, indices -1, -2 and -3 are valid and used in nmbrString
:
- Index
-1
is the length of the number string. SeenmbrLen
(mmdata.c line 1023) - Index
-2
is the allocated length, i.e. how many numbers are available totally (could be more than the actual current length of the string). SeenmbrAllocLen
(mmdata.c line 1032) - Index
-3
is the location inmemUsedPool
(for memory management)
In this specific case, I wanted to include the length of the string in the hash, which I think makes sense but shall have been explained.
In any case, you clearly spotted a mistake here because in the salt, index -1 is clearly invalid!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any use of negative indices should be wrapped behind a macro to provide more safety and clarity. (That is, there should be a macro like #define nmbrLen(p) p[-1]
where p
has type nmbrString
which is a typedef for int*
or what have you.
mmdata.c
Outdated
static long salt[] = { 4938, 48977, 6897, 7293, 2663, 7925, 2999, 12238, | ||
40033, 14038, 10699, 29746, 56108, 34526, 63576, 52053, 61949, 41177, 43740, 22822 | ||
}; | ||
long i; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Declaration is Initialization rule (section 4.5 in https://docs.oracle.com/cd/E17984_01/doc.898/e14699/variables_data_structs.htm) Initialize with value from line 1113
mmdata.c
Outdated
/* This simply computes a XOR of the first numbers */ | ||
int nmbrHash(nmbrString *s) | ||
{ | ||
static long salt[] = { 4938, 48977, 6897, 7293, 2663, 7925, 2999, 12238, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only 13 (?) values actually needed. Remove/comment out unneeded ones.
Where is the hash algorithm explained? Add a comment/link.
to be continued...
mmdata.c
Outdated
long i; | ||
if (sstart - 1 + len > nmbrLen(s) | ||
|| tstart - 1 + len > nmbrLen(t)) return 0; | ||
for (i = 0; s[sstart-1+i] == t[tstart-1+i] && i<len; i++); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suspicious index in first loop: -1, if sstart == 0. (1) Out of bounds access to s and t, or (2) Parameters sstart and tstart must be > 0.
Either provide parameter checks (cf. line 1211), or (minimum) state limitations in comment line 1122
mmdata.c
Outdated
long nmbrInstrN(long start_position, long occ, nmbrString *string1, | ||
nmbrString *string2, long start2, long length2) | ||
{ | ||
if (start_position < 1) start_position = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means: If garbage is provided in parameter start_position, then I sanitize it to something more useful, hoping for the best. This kind of fault tolerance supports a sloppy programming on the caller's side. Better throw an exception, or call bug()
mmdata.c
Outdated
@@ -1170,6 +1203,34 @@ nmbrString *nmbrSpace(long n) | |||
return (sout); | |||
} | |||
|
|||
/* Search for the nth occurrence of string2 in string1 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain parameters (semantics and limitations) in a comment. For example, what does occ mean? n would match the functions name.
mmdata.c
Outdated
start_position--; | ||
for(; occ > 0;occ--) { | ||
long ls1, i, j; | ||
ls1 = nmbrLen(string1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull constant evaluations out of the loop.
mmdata.c
Outdated
if (start_position < 1) start_position = 1; | ||
start_position--; | ||
for(; occ > 0;occ--) { | ||
long ls1, i, j; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
declare i and j in for commands
mmdata.c
Outdated
} | ||
} | ||
if (found) { | ||
start_position = i+1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
incorrect indentation
mmdata.c
Outdated
for (i = start_position - 1; i <= ls1 - length2; i++) { | ||
flag found = 1; | ||
for (j = 0; j < length2; j++) { | ||
if (string1[i+j] != string2[start2-1+j]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition is usually part of the for-command.
mmdata.c
Outdated
/* Add a single number to start of a nmbrString - faster than nmbrCat */ | ||
nmbrString *nmbrUnshiftElement(nmbrString *g, long element) | ||
{ | ||
long length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
declaration should be initialization
mmdata.c
Outdated
@@ -1548,6 +1609,21 @@ nmbrString *nmbrAddElement(nmbrString *g, long element) | |||
} | |||
|
|||
|
|||
/* Add a single number to start of a nmbrString - faster than nmbrCat */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return value is pushed on an internal variable stack with an implicit memory management. This important detail is not mentioned in the comment.
mmdata.h
Outdated
@@ -281,6 +281,10 @@ long nmbrLen(nmbrString *s); | |||
long nmbrAllocLen(nmbrString *s); | |||
void nmbrZapLen(nmbrString *s, long length); | |||
|
|||
/* Search for the nth occurrence of string2 in string1 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should explain parameters and specify their limitations.
mmhtbl.c
Outdated
#define NO_LINKEDITEM -1 | ||
linked *linkedItems; | ||
int free_linkedItem; | ||
flag htinit_done = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is private to htinit() so declare it there as a static variable
mmhtbl.c
Outdated
/* Static buffer for the linked lists */ | ||
#define NB_LINKEDITEMS 50000L | ||
#define NO_LINKEDITEM -1 | ||
linked *linkedItems; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is safer to initialize this to NULL and 0
mmhtbl.c
Outdated
|
||
/* Create and fill the structure */ | ||
hashtable ht; | ||
ht.name = name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem here is that you copy a pointer, not its value, from a parameter to the result. The caller handles the parameter's contents with a series of let() instructions, finally freeing it. All these operations are done without ht.name in mind. This opens up all kinds of memory allocation/access failures. Even if you know, this won't happen in the foreseeable future, you need a guarantee, or a contract, here to decouple caller and callee.
mmhtbl.c
Outdated
/* Found it, free the object and remove it from the chain */ | ||
hashtable->freeFunc(&linkedItems[*pli].key, &linkedItems[*pli].object); | ||
int old_free = free_linkedItem; | ||
free_linkedItem = *pli; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DRY duplicated code from line 76
to be continued...
@wlammen thank you very much for your careful review! |
New Year's Eve is closing in. I need some time for it. |
mmwsts.c
Outdated
/* The structure containing information about the STS variable tokens */ | ||
struct stsVar_struct { | ||
long stsType; /* type of the token in the STS (must be a constant tokenId) */ | ||
long stsSchemeId; /* number of the schemed in which this variable is defined + 1. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"schemed" typo
mmwsts.c
Outdated
}; | ||
|
||
/* Current output format for STS */ | ||
vstring stsFormat = ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
g_ prefix missing from global variables (here and elsewhere in this file)
mmwsts.c
Outdated
/* Math symbol comparison for bsearch */ | ||
/* Here, key is pointer to a character string. */ | ||
/* Here we search only the global tokens, those | ||
* which endStatement is the last statement */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
English: where endStatement...
mmwsts.c
Outdated
/* Here, key is pointer to a character string. */ | ||
/* Here we search only the global tokens, those | ||
* which endStatement is the last statement */ | ||
int mathSrchGlbCmp(const void *key, const void *data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason to avoid specific C types in parameter declarations? And so dodge C type checking?
key: char const *
data: mathToken_struct const *
mmwsts.c
Outdated
if(g_MathToken[ *((long *)data) ].endStatement == g_statements) return 0; | ||
|
||
/* Find the direction in which the target token is */ | ||
for(long *ptr = (long*)data; !strcmp(key, g_MathToken[ *ptr ].tokenName); ptr++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimization: The first loop seems to check element data again, and the outcome is known to be 0. So I suggest to initialize with long* ptr = (long*) data + 1;
mmwsts.c
Outdated
/* Cache to speed up conversions */ | ||
hashtable stsCache; | ||
|
||
/* Math symbol comparison for bsearch */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment should explain the result, that is limited to -1, 0 and +1, and what is delivered when.
It seems possible that the token list contains the same key multiple times in succession. Since this is part of a binary search, data may point somewhere in the middle of such a series. Is it guaranteed to hold just one global element (there is a suspicious active flag in the structure, that may allow a disabled and enabled element in the same series)? Is it guaranteed there is always a global element present? If either assumption is missed, the binary search may fail.
Clarify the so-called pre-/postconditions in a comment.
mmwsts.c
Outdated
long i; | ||
char *fbPtr; | ||
long textLen, tokenLen_; | ||
long *g_mathKeyPtr; /* bsearch returned value */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
incorrect use of prefix g_
mmwsts.c
Outdated
/* Make sure that g_mathTokens has been initialized */ | ||
if (!g_mathTokens) bug(1717); | ||
|
||
textLen = (long)strlen(text); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definition is initialization long textLen = (long) strlen(text); same for wrklen etc.
mmwsts.c
Outdated
#include "mmvstr.h" | ||
#include "mmdata.h" | ||
#include "mminou.h" | ||
#include "mmpars.h" /* For rawSourceError and mathSrchCmp and lookupLabel */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and whiteSpaceLen
mmwsts.c
Outdated
wrkNmbrPtr[mathStringLen] = *g_mathKeyPtr; | ||
mathStringLen++; | ||
fbPtr = fbPtr + tokenLen_ + 1; /* Move on to next token */ | ||
if(fbPtr >= text + textLen) break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be the while condition
mmwsts.c
Outdated
return NULL_NMBRSTRING; | ||
} | ||
wrkNmbrPtr[mathStringLen] = *g_mathKeyPtr; | ||
mathStringLen++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tip: wrkNmbrPtr[mathStringLen++] = ... saves the following line and can reduce code size
same for fbPtr += tokenLen_ + 1;
mmwsts.c
Outdated
return mathString; | ||
} | ||
|
||
/* Store a couple key/object into the cache */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document Pre/Postconditions
mmwsts.c
Outdated
} | ||
|
||
/* Dump a couple key/object from the cache */ | ||
void stsDumpCache(nmbrString *key, vstring object) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suspicious cast to eqFunc in line 235: The signature of this function does not match that of eqFunc: int (eqFunc)(void *, void *)
mmwsts.c
Outdated
if(stsUseCache) { | ||
stsCache = htcreate(format, STS_CACHE_BUCKETS, "", (hashFunc*)&nmbrHash, (eqFunc*)&nmbrEq, | ||
(letFunc*)&stsStoreCache, (freeFunc*)&stsFreeCache, | ||
(eqFunc*)&stsDumpCache); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suspicious cast: signature of stsDumpCache does not match eqFunc.
mmwsts.c
Outdated
} | ||
|
||
/* Parse a file containing the structured typesetting rules. */ | ||
int parseSTSRules(vstring format) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coding style: This function is way too long. It covers more than 300 lines. This exceeds the recommended max length of 20 lines by far, and its code is easily broken down into steps that can be moved into helper functions. https://stackoverflow.com/questions/475675/when-is-a-function-too-long
document pre/postconditions: example: stsFormat is both an input and an output variable, but that is not easily seen.
mmwsts.c
Outdated
g_outputToString = 0; | ||
|
||
/* If the same format was already parsed, nothing to do. */ | ||
if(strcmp(stsFormat, format) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move the early out code to the beginning of the function where parameter checks usually take place
…amath-exe into tirix-structured-typesetting
mmhtbl.c
Outdated
/* Dumpts the whole table */ | ||
void htdump(hashtable *hashtable) { | ||
print2("Hashtable %s:\n", hashtable->name); | ||
//for(int bucket=0;bucket<hashtable->bucket_count;bucket++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type of comment is not ANSI C compatible (see section 3.1.9). Use /* ... */ instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we were agreed on C99 now? I don't mind seeing //
creep in, even if we don't do any bulk conversions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we? Can you point me to where the decision took place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking of #8 (comment) . In short: we already compile only on C99, even before taking into account the recent refactors. (I'm not opposed to pushing the minimum beyond C99 (i.e. C11), but I don't think we should consider ANSI C (C89) any more.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In C99 this issue is not relevant and can be ignored.
mmdata.c
Outdated
{ | ||
if (start_position < 1) start_position = 1; | ||
start_position--; | ||
for(; occ > 0;occ--) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimization: Use a dedicated loop variable in for loops. A compiler will allocate a CPU register for that.
for (long o = occ + 1; --o > 0;) {... (replace occ with o in loop)...}
is how I would write it.
Info: There is a tiny semantic change in my example: If the memory model of the computer supports signed magnitude instead of two's complement AND occ is MAX_LONG then (occ + 1)-1 == occ may not hold. We can safely ignore this nowadays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
long
is a signed type, and signed overflow is UB, so I think that this change is legal by the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@digama0 Exactly. Because occ+1 is UB when occ==MAX_LONG, --o may contain anything, and the loop starts with a random value. This cannot happen with the original code. We can safely ignore this semantic difference here.
mmdata.c
Outdated
@@ -1098,6 +1131,34 @@ nmbrString *nmbrSpace(long n) | |||
return sout; | |||
} | |||
|
|||
/* Search for the nth occurrence of string2 in string1 */ | |||
long nmbrInstrN(long start_position, long occ, nmbrString *string1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is occ defined as long? Do you really expect more than two billion occurrences (or even 32000 should int be only 16 bit wide) of a substring in a string?
mmdata.c
Outdated
} | ||
} | ||
if (found) { | ||
start_position = i+1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check indentation
mmhtbl.c
Outdated
#define NO_LINKEDITEM -1 | ||
linked *linkedItems; | ||
int free_linkedItem; | ||
flag htinit_done = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this a static variable within htinit(). It is completely private implementation detail in this function.
mmwsts.c
Outdated
return mmlLine; | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change this to
#if 0
test code
#endif
to comply with ANSI C
I think there are already lots of ideas and issues for refactoring the source. In particular extracting code from long functions into auxiliary sub-function and documenting pre/postconditions can help with further review, since the source becomes a lot more readable then. In addition merge conflicts have to be resolved. |
Thank you very much @wlammen for your efforts reviewing my code! Indeed now, before this can be merged, I'll have to solve the conflicts with all the refactoring that's going on. |
I took care of merging this with master. @tirix , you should double check the last commit, which fixes a few warnings I was getting with the original version. |
Thank you Mario! |
I've tried to make as few changes as possible. The changes are in the following source files:
metamath.c
for the changes to theSHOW STATEMENT
command and the newVERIFY STS
command, as well as the output post-processing,mmcmdl.c
for the newHELP
command options, and the new commands,mmcmds.c
for the changes to theSHOW STATEMENT
command at top level (distinct and dummy variables, syntax hints),mmdata.c
for some utility functions,mmhlpa.c
andmmhlpb.c
for the new built-inHELP
options,mmhtbl.c
, a new file with a hash table implementation,mmwsts.c
, a new file with the main STS implementation,mmwtex.c
, for the main hook into the STS formula output and for in-line comment formulas.This also adds a
.gitignore
file to ignore object files.