Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve minimize performance #94

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

savask
Copy link

@savask savask commented Jul 17, 2022

This is an implementation of the idea presented in #10 . One precomputes the set of math constants used in the proof of the minimized statement (it is stored in .tmp field of g_MathToken in the code), and for each statement we check if all its constants lie in this set. Statements which pass this validation step are marked with 1 in the boolean array usefulStatements. Then the usual back-and-forth minimize_with loop is run, but now it checks only "useful statements".

If minimize_with is supplied with the /VERBOSE option, it will output the number of potentially useful theorems at the beginning.

This optimization reduces the minimization time twofold. For example, on my computer smfsupmpt requires 94s to minimize with the vanilla version, and 42s with this optimization; ssmapsn requires 32s without and 13s with the optimization.

One downside of my code is the allocation of the usefulStatements array, I would prefer it to be an expandable vector (not a boolean array), but as far as I understand metamath codebase doesn't have a vector implementation.

@benjub
Copy link
Collaborator

benjub commented Jul 17, 2022

I'm not sure I understand. It may be the case that a proof is shortened by statements using constants not appearing in the current proof. Maybe you know in advance that such shortenings will not be caught by the current minimize_with program, so you're saying we might as well discard them from the beginning ?

@savask
Copy link
Author

savask commented Jul 17, 2022

Maybe you know in advance that such shortenings will not be caught by the current minimize_with program, so you're saying we might as well discard them from the beginning ?

Yes, minimize_with X tries to use X at one of the steps of the proof, so naturally all constants used in X must be already present in the proof of our statement. I can imagine a multi-step subproof using some new constants yet shortening the proof of the original statement, but minimize_with can't find those, unfortunately.

src/metamath.c Outdated

/* Fill the usefulStatements array */
for (k = 1; k < g_proveStatement; k++) {
usefulStatements[k] = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to preset it to 0, so you dont need all the usefulStatements[k] = 0 instructions for the early-out-of-loopbody instructions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

for (i = 0; i < mlen; i++) {
if (g_MathToken[mString[i]].tokenType == (char)con_) {
if (!g_MathToken[mString[i]].tmp)
usefulStatements[k] = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be followed by a break; instruction

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@david-a-wheeler
Copy link
Member

metamath codebase doesn't have a vector implementation.

It's certainly possible to add one. The usual approach is a simple data structure with a count of the "number of elements" and a pointer to the (beginning of the) array elements; if not NULL it needs have been allocated (e.g., with malloc). To resize, call realloc() to resize the array, move things as necessary, and change the count. I did a quick Google search and found this one, implemented as OSS: https://github.com/eteran/c-vector

@wlammen
Copy link
Contributor

wlammen commented Jul 17, 2022

https://github.com/eteran/c-vector
And it is C89 (ANSI C https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf) compatible. I wonder whether we should give up on this and at least advance to C11.

@david-a-wheeler
Copy link
Member

I wonder whether we should give up on this and at least advance to C11.

We could move to C11 if that's helpful. The key is compiler support in deployed systems. GCC seems to have good C11 support.

However, I don't think that helps. C11 doesn't have a vector type as far as I know, and https://github.com/eteran/c-vector doesn't require C11. We should write code that is compatible with C11, but I think we should only require C11 if there's advantage to it. I didn't see any advantages in this list of C11 additions

@wlammen
Copy link
Contributor

wlammen commented Jul 17, 2022

However, I don't think that helps.

See for example https://en.cppreference.com/w/c/io/fprintf to see how much C has developed since 1989. Functions like snprintf really make sense. Or // style comments.

gcc -O3 -funroll-loops -finline-functions -fomit-frame-pointer -DINLINE=inline -g -Wall -Wextra -o metamath metamath.o mmcmdl.o mmcmds.o mmdata.o mmfatl.o mmhlpa.o mmhlpb.o mminou.o mmpars.o mmpfas.o mmunif.o mmveri.o mmvstr.o mmword.o mmwtex.o

This is the command executed by the build.sh script. It does not check for ANSI C. The option -ansi should be added to enforce this standard. By the way, this can be done by changing line 4567 of configure to new_CFLAGS="-Wall -Wextra -ansi".
This reveals a lot of ANSI violations like for (long i = 0; i < len; i++):
error: ‘for’ loop initial declarations are only allowed in C99 or C11 mode
in mmcmdl.c for example. In short, the current state is not ANSI at all.

@david-a-wheeler
Copy link
Member

I agree snprintf and // are useful, but I those are C99 additions, not C11.

I'm not opposed to switching to a later C spec, I just want to make sure that a spec move is justified and that it is widely implemented by commonly-available & distributed compilers.

We should at least move up to C99, no question in my mind!!! C99 is very widely supported & I think we're already using its facilities. The only question is if we should move to C11 or not.

@digama0
Copy link
Member

digama0 commented Jul 18, 2022

We are already on C11 C99. The build script uses it and // can be spotted in the source code.

@savask
Copy link
Author

savask commented Jul 18, 2022

It's certainly possible to add one.

It is, but is it worth doing so for one application? Also, metamath-exe uses its own memory management system, for example, in code I used poolMalloc instead of malloc and poolFree instead of free. I was wondering if nmbrString can be used as vector of numbers, but I haven't been able to find a push_back equivalent in mmdata.c.

@wlammen
Copy link
Contributor

wlammen commented Jul 18, 2022

We are already on C99.

Indeed: CONTRIBUTIONS.md states this literally.
I added the -std=c99 flag to configure as indicated in my previous post, and the compiling performed flawlessly, even no warning.
To avoid using the results of a previous build without this flag, run build.sh with option -c

@benjub benjub mentioned this pull request Oct 4, 2022
Comment on lines +5171 to +5174
/* usefulStatements is a boolean array, where usefulStatements[k] == 1
iff statement k might be useful in minimization */
char *usefulStatements;
usefulStatements = poolMalloc(g_proveStatement*sizeof(char));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use bool, true, false as advised in #17 (comment) ? Then a few statements below have minor modifications, e.g., usefulCount += usefulStatements[k]; has to be replaced by something like if (usefulStatements[k]) {usefulCount += 1}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think originally I used char for consistency, for example, the .tmp field uses 0 and 1 instead of bool. As a positive side effect, usefulCount can be updated without branching (although I don't think it makes any great difference in this case).

@benjub
Copy link
Collaborator

benjub commented Oct 4, 2022

Having @wlammen's approval is already a pretty good guarentee ! If @digama0 can find some time to review and if ok merge this, this would be nice and this would also unblock #93.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants