Improve minimize performance #94

savask · 2022-07-17T12:08:02Z

This is an implementation of the idea presented in #10 . One precomputes the set of math constants used in the proof of the minimized statement (it is stored in .tmp field of g_MathToken in the code), and for each statement we check if all its constants lie in this set. Statements which pass this validation step are marked with 1 in the boolean array usefulStatements. Then the usual back-and-forth minimize_with loop is run, but now it checks only "useful statements".

If minimize_with is supplied with the /VERBOSE option, it will output the number of potentially useful theorems at the beginning.

This optimization reduces the minimization time twofold. For example, on my computer smfsupmpt requires 94s to minimize with the vanilla version, and 42s with this optimization; ssmapsn requires 32s without and 13s with the optimization.

One downside of my code is the allocation of the usefulStatements array, I would prefer it to be an expandable vector (not a boolean array), but as far as I understand metamath codebase doesn't have a vector implementation.

benjub · 2022-07-17T14:01:19Z

I'm not sure I understand. It may be the case that a proof is shortened by statements using constants not appearing in the current proof. Maybe you know in advance that such shortenings will not be caught by the current minimize_with program, so you're saying we might as well discard them from the beginning ?

savask · 2022-07-17T14:08:15Z

Maybe you know in advance that such shortenings will not be caught by the current minimize_with program, so you're saying we might as well discard them from the beginning ?

Yes, minimize_with X tries to use X at one of the steps of the proof, so naturally all constants used in X must be already present in the proof of our statement. I can imagine a multi-step subproof using some new constants yet shortening the proof of the original statement, but minimize_with can't find those, unfortunately.

wlammen · 2022-07-17T16:26:41Z

src/metamath.c

+
+      /* Fill the usefulStatements array */
+      for (k = 1; k < g_proveStatement; k++) {
+        usefulStatements[k] = 1;


I suggest to preset it to 0, so you dont need all the usefulStatements[k] = 0 instructions for the early-out-of-loopbody instructions.

wlammen · 2022-07-17T16:28:55Z

src/metamath.c

+        for (i = 0; i < mlen; i++) {
+          if (g_MathToken[mString[i]].tokenType == (char)con_) {
+            if (!g_MathToken[mString[i]].tmp)
+              usefulStatements[k] = 0;


This should be followed by a break; instruction

david-a-wheeler · 2022-07-17T17:17:46Z

metamath codebase doesn't have a vector implementation.

It's certainly possible to add one. The usual approach is a simple data structure with a count of the "number of elements" and a pointer to the (beginning of the) array elements; if not NULL it needs have been allocated (e.g., with malloc). To resize, call realloc() to resize the array, move things as necessary, and change the count. I did a quick Google search and found this one, implemented as OSS: https://github.com/eteran/c-vector

wlammen · 2022-07-17T17:26:14Z

https://github.com/eteran/c-vector
And it is C89 (ANSI C https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf) compatible. I wonder whether we should give up on this and at least advance to C11.

david-a-wheeler · 2022-07-17T19:09:17Z

I wonder whether we should give up on this and at least advance to C11.

We could move to C11 if that's helpful. The key is compiler support in deployed systems. GCC seems to have good C11 support.

However, I don't think that helps. C11 doesn't have a vector type as far as I know, and https://github.com/eteran/c-vector doesn't require C11. We should write code that is compatible with C11, but I think we should only require C11 if there's advantage to it. I didn't see any advantages in this list of C11 additions

wlammen · 2022-07-17T20:14:10Z

However, I don't think that helps.

See for example https://en.cppreference.com/w/c/io/fprintf to see how much C has developed since 1989. Functions like snprintf really make sense. Or // style comments.

gcc -O3 -funroll-loops -finline-functions -fomit-frame-pointer -DINLINE=inline -g -Wall -Wextra -o metamath metamath.o mmcmdl.o mmcmds.o mmdata.o mmfatl.o mmhlpa.o mmhlpb.o mminou.o mmpars.o mmpfas.o mmunif.o mmveri.o mmvstr.o mmword.o mmwtex.o

This is the command executed by the build.sh script. It does not check for ANSI C. The option -ansi should be added to enforce this standard. By the way, this can be done by changing line 4567 of configure to new_CFLAGS="-Wall -Wextra -ansi".
This reveals a lot of ANSI violations like for (long i = 0; i < len; i++):
error: ‘for’ loop initial declarations are only allowed in C99 or C11 mode
in mmcmdl.c for example. In short, the current state is not ANSI at all.

david-a-wheeler · 2022-07-17T23:07:50Z

I agree snprintf and // are useful, but I those are C99 additions, not C11.

I'm not opposed to switching to a later C spec, I just want to make sure that a spec move is justified and that it is widely implemented by commonly-available & distributed compilers.

We should at least move up to C99, no question in my mind!!! C99 is very widely supported & I think we're already using its facilities. The only question is if we should move to C11 or not.

digama0 · 2022-07-18T04:21:53Z

We are already on ~~C11~~ C99. The build script uses it and // can be spotted in the source code.

savask · 2022-07-18T07:09:27Z

It's certainly possible to add one.

It is, but is it worth doing so for one application? Also, metamath-exe uses its own memory management system, for example, in code I used poolMalloc instead of malloc and poolFree instead of free. I was wondering if nmbrString can be used as vector of numbers, but I haven't been able to find a push_back equivalent in mmdata.c.

wlammen · 2022-07-18T08:40:28Z

We are already on C99.

Indeed: CONTRIBUTIONS.md states this literally.
I added the -std=c99 flag to configure as indicated in my previous post, and the compiling performed flawlessly, even no warning.
To avoid using the results of a previous build without this flag, run build.sh with option -c

benjub · 2022-10-04T19:52:21Z

src/metamath.c

+      /* usefulStatements is a boolean array, where usefulStatements[k] == 1
+         iff statement k might be useful in minimization */
+      char *usefulStatements;
+      usefulStatements = poolMalloc(g_proveStatement*sizeof(char));


Why not use bool, true, false as advised in #17 (comment) ? Then a few statements below have minor modifications, e.g., usefulCount += usefulStatements[k]; has to be replaced by something like if (usefulStatements[k]) {usefulCount += 1}

I think originally I used char for consistency, for example, the .tmp field uses 0 and 1 instead of bool. As a positive side effect, usefulCount can be updated without branching (although I don't think it makes any great difference in this case).

benjub · 2022-10-04T19:56:17Z

Having @wlammen's approval is already a pretty good guarentee ! If @digama0 can find some time to review and if ok merge this, this would be nice and this would also unblock #93.

Improve minimize * speed. See issue metamath#10.

1edf9e3

wlammen suggested changes Jul 17, 2022

View reviewed changes

Preset usefulStatements to 0 at the start of the loop.

5b796ee

wlammen approved these changes Jul 18, 2022

View reviewed changes

benjub mentioned this pull request Oct 4, 2022

Formatting bug #93

Open

benjub reviewed Oct 4, 2022

View reviewed changes

GinoGiotto mentioned this pull request May 16, 2023

Shortened proofs 12 metamath/set.mm#3180

Merged

savask mentioned this pull request Feb 4, 2024

Add minimizer functionality metamath/metamath-knife#156

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve minimize performance #94

Improve minimize performance #94

savask commented Jul 17, 2022 •

edited

Loading

benjub commented Jul 17, 2022

savask commented Jul 17, 2022

wlammen Jul 17, 2022

savask Jul 18, 2022

wlammen Jul 17, 2022

savask Jul 18, 2022

david-a-wheeler commented Jul 17, 2022

wlammen commented Jul 17, 2022 •

edited

Loading

david-a-wheeler commented Jul 17, 2022

wlammen commented Jul 17, 2022 •

edited

Loading

david-a-wheeler commented Jul 17, 2022

digama0 commented Jul 18, 2022 •

edited

Loading

savask commented Jul 18, 2022

wlammen commented Jul 18, 2022 •

edited

Loading

benjub Oct 4, 2022

savask Oct 5, 2022

benjub commented Oct 4, 2022

Improve minimize performance #94

Are you sure you want to change the base?

Improve minimize performance #94

Conversation

savask commented Jul 17, 2022 • edited Loading

benjub commented Jul 17, 2022

savask commented Jul 17, 2022

wlammen Jul 17, 2022

Choose a reason for hiding this comment

savask Jul 18, 2022

Choose a reason for hiding this comment

wlammen Jul 17, 2022

Choose a reason for hiding this comment

savask Jul 18, 2022

Choose a reason for hiding this comment

david-a-wheeler commented Jul 17, 2022

wlammen commented Jul 17, 2022 • edited Loading

david-a-wheeler commented Jul 17, 2022

wlammen commented Jul 17, 2022 • edited Loading

david-a-wheeler commented Jul 17, 2022

digama0 commented Jul 18, 2022 • edited Loading

savask commented Jul 18, 2022

wlammen commented Jul 18, 2022 • edited Loading

benjub Oct 4, 2022

Choose a reason for hiding this comment

savask Oct 5, 2022

Choose a reason for hiding this comment

benjub commented Oct 4, 2022

savask commented Jul 17, 2022 •

edited

Loading

wlammen commented Jul 17, 2022 •

edited

Loading

wlammen commented Jul 17, 2022 •

edited

Loading

digama0 commented Jul 18, 2022 •

edited

Loading

wlammen commented Jul 18, 2022 •

edited

Loading