Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Combined addition and multiplication for fmpz_mpoly #994

Open
wants to merge 54 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
fcc7250
Initial version of the addmul code; only does a single, multi-ary
BrentBaccala Jul 1, 2021
0b493f3
minor cleanup; improve variable names
BrentBaccala Jul 1, 2021
10c50d6
addmul: add triple multiplication test case
BrentBaccala Jul 5, 2021
fd3aca0
Bug fixes exposed by triple multiplication test case:
BrentBaccala Jul 5, 2021
02613ca
addmul: remove the special case code for small coefficients and small…
BrentBaccala Jul 11, 2021
6fcd257
addmul: convert to new calling convention
BrentBaccala Jul 13, 2021
92a8ca3
name change fmpz_mpoly_addmul -> fmpz_mpoly_addmul_multi
BrentBaccala Jul 13, 2021
70e0c1a
addmul_multi - enhance code to do addition of multiple terms.
BrentBaccala Jul 13, 2021
83232ba
addmul_multi: first multi-term product f*g+f*h passes test
BrentBaccala Jul 13, 2021
c9f39d9
addmul_multi - add aliasing tests
BrentBaccala Jul 15, 2021
a4ae9c6
addmul_multi - change first test case to more closely mimic "mul" tes…
BrentBaccala Jul 15, 2021
b328b3a
addmul_multi - remove lingering variables from unused small coefficie…
BrentBaccala Jul 15, 2021
6022f27
addmul_multi: BUG FIX make sure largest polynomial is always first
BrentBaccala Jul 26, 2021
98f60d3
addmul_multi: make heap dynamically sized
BrentBaccala Jul 26, 2021
2951c0b
addmul_multi: make exponents dynamically sized
BrentBaccala Jul 27, 2021
ed1d09a
addmul_multi: dynamically size chain
BrentBaccala Jul 27, 2021
f222c87
addmul_multi: make Q dynamically sized
BrentBaccala Jul 27, 2021
4f30b7b
addmul_multi - use unsigned short for hind array
BrentBaccala Jul 27, 2021
87f6041
addmul_multi: allocate chain and heap in large enough blocks to avoid…
BrentBaccala Jul 27, 2021
22b8776
addmul_multi: remove limitation on size of input sequences
BrentBaccala Jul 27, 2021
cf8c34a
addmul_multi_threaded: add a multi-threaded version of addmul_multi
BrentBaccala Aug 10, 2021
3b77482
addmul_multi_threaded - free malloc'ed data structures
BrentBaccala Aug 10, 2021
28efc70
addmul_multi_threaded - add new test case f*g + f*g = 2*f*g
BrentBaccala Aug 11, 2021
04a1aa3
addmul_multi (single and multi threaded version) - handle zero length…
BrentBaccala Aug 11, 2021
7e811e0
addmul_multi_threaded - add 4 term test
BrentBaccala Aug 11, 2021
b78d5c4
addmul_multi_threaded - bug fixes uncovered by 4 term test
BrentBaccala Aug 11, 2021
b1a90e5
addmul_multi_theaded - add more test cases to improve code coverage
BrentBaccala Aug 11, 2021
3aabcbe
fmpz_mpoly_addmul_multi - bug fix with test case to exercise it
BrentBaccala Aug 12, 2021
c879701
addmul_multi_threaded - minor typo on a pointer type
BrentBaccala Aug 12, 2021
a5b06e4
addmul_multi_threaded - add a condition variable to pause and restart…
BrentBaccala Aug 12, 2021
4bab56d
fmpz_mpoly_addmul_threaded - use slong instead of short in heap struc…
BrentBaccala Aug 12, 2021
5872000
fmpz_mpoly_addmul_multi: another bug, another test case
BrentBaccala Aug 12, 2021
99e471f
addmul_multi_threaded: add a subroutine to print status messages as w…
BrentBaccala Aug 16, 2021
44e3411
add an "output_function" to polynomials, and use it in addmul_multi_t…
BrentBaccala Aug 16, 2021
bd81ccb
addmul_multi: add context argument to output_function, and some bug f…
BrentBaccala Aug 19, 2021
4cd5aa1
fmpz_mpoly_addmul_multi: bug fix with test case: stall-and-fill logic…
BrentBaccala Aug 19, 2021
d11b781
add some FLINT_ASSERTs that I added when trying to find the last bug
BrentBaccala Aug 19, 2021
be3e9b7
fmpz_mpoly_addmul_multi: add a 5-term sum test case
BrentBaccala Aug 19, 2021
3ea194e
addmul_multi: revert adding a field to the fmpz_mpoly structure,
BrentBaccala Aug 20, 2021
4389dcc
fmpz_mpoly_addmul_multi: print status is now smarter about putting sp…
BrentBaccala Aug 20, 2021
346b869
new function fmpz_mpoly_abstract_add
BrentBaccala Aug 23, 2021
5409b7a
forgot to commit include file declaration for abstract add
BrentBaccala Aug 23, 2021
b6a449d
addmul_multi_threaded: add a 'bits' argument to output_function
BrentBaccala Aug 23, 2021
fcf4c21
bug fix in abstract add
BrentBaccala Aug 23, 2021
edd4af9
bug fix in abstract add
BrentBaccala Aug 23, 2021
97cf6c8
abstract_add: add a test harnass and fix some bugs exposed by it
BrentBaccala Aug 24, 2021
db864f1
add another test case to addmul_multi_threaded that passes with no pr…
BrentBaccala Aug 24, 2021
54159c5
abstract_add and addmul_multi_threaded: call output_function a final
BrentBaccala Aug 24, 2021
6367666
addmul_multi_threaded - make sure we don't do anything with 'A' if ou…
BrentBaccala Aug 25, 2021
ce19d03
abstract_add: allow Blist to be NULL
BrentBaccala Aug 25, 2021
cb2cdc3
rename 'abstract' files to be 'abstract_add'
BrentBaccala Aug 25, 2021
b86b7a2
adjust addmul_multi_threaded_abstract function signature to take a vo…
BrentBaccala Sep 11, 2021
e873a32
silence some compiler warnings
BrentBaccala Sep 12, 2021
e91bf0f
fmpz_mpoly_abstract_add's input_function now takes ulong index instea…
BrentBaccala Oct 13, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions fmpz_mpoly.h
Original file line number Diff line number Diff line change
Expand Up @@ -809,6 +809,32 @@ FLINT_DLL int _fmpz_mpoly_mul_dense(fmpz_mpoly_t P,
const fmpz_mpoly_t B, fmpz * maxBfields,
const fmpz_mpoly_ctx_t ctx);

/* Combined Addition and Multiplication **************************************/

FLINT_DLL void fmpz_mpoly_addmul_multi(fmpz_mpoly_t A,
const fmpz_mpoly_struct ** Blist, const slong * Blengths, const slong Bnumseq, const fmpz_mpoly_ctx_t ctx);

FLINT_DLL void fmpz_mpoly_addmul_multi_threaded(fmpz_mpoly_t A,
const fmpz_mpoly_struct ** Blist, const slong * Blengths, const slong Bnumseq, const fmpz_mpoly_ctx_t ctx);

FLINT_DLL void fmpz_mpoly_addmul_multi_threaded_abstract(void * A,
const fmpz_mpoly_struct ** Blist, const slong * Blengths, const slong Bnumseq, const fmpz_mpoly_ctx_t ctx,
const char * (* output_function)(void * A, slong index, const flint_bitcnt_t bits,
ulong * exp, fmpz_t coeff, const fmpz_mpoly_ctx_t ctx));

/* Abstract Addition *********************************************************/

void fmpz_mpoly_abstract_add(
void * A,
void ** Blist,
const slong Blen,
const flint_bitcnt_t bits,
const fmpz_mpoly_ctx_t ctx,
void (* input_function)(void * poly, ulong index, const flint_bitcnt_t bits,
ulong * exp, fmpz_t coeff, const fmpz_mpoly_ctx_t ctx),
void (* output_function)(void * poly, slong index, const flint_bitcnt_t bits,
ulong * exp, fmpz_t coeff, const fmpz_mpoly_ctx_t ctx));

/* Powering ******************************************************************/

FLINT_DLL int fmpz_mpoly_pow_fmpz(fmpz_mpoly_t A, const fmpz_mpoly_t B,
Expand Down
242 changes: 242 additions & 0 deletions fmpz_mpoly/abstract_add.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,242 @@
/*
Copyright (C) 2021 Brent Baccala

This file is part of FLINT.

FLINT is free software: you can redistribute it and/or modify it under
the terms of the GNU Lesser General Public License (LGPL) as published
by the Free Software Foundation; either version 2.1 of the License, or
(at your option) any later version. See <https://www.gnu.org/licenses/>.
*/

#include <gmp.h>
#include <stdlib.h>
#include "flint.h"
#include "fmpz.h"
#include "fmpz_mpoly.h"

/* fmpz_mpoly_abstract_add adds a set of polynomials by pulling from an input
* function into a heap structure and writing an output polynomial by calling
* an output function.
*
* output_function is called every time we output a term
* it is called with an index of -1 at the end of the polynomial
*
* input_function
* copies an exponent vector and a coefficient into the provided pointer arguments
* sets a coefficient of zero to indicate the end of the polynomial
*
* The default input and output functions (used if the function pointers are NULL)
* read and write to/from a standard FLINT polynomial.
*/

void _default_output_function(void * poly, slong index, const flint_bitcnt_t bits,
ulong * exp, fmpz_t coeff, const fmpz_mpoly_ctx_t ctx)
{
slong N = mpoly_words_per_exp(bits, ctx->minfo);
fmpz_mpoly_struct * A = (fmpz_mpoly_struct *) poly;

if (index == -WORD(1))
return;

fmpz_mpoly_fit_bits(A, bits, ctx);
fmpz_mpoly_fit_length(A, index + 1, ctx);
mpoly_monomial_set(A->exps + index*N, exp, N);
fmpz_swap(A->coeffs + index, coeff);
_fmpz_mpoly_set_length(A, index + 1, ctx);
A->bits = bits;
}

void _default_input_function(void * poly, ulong index, const flint_bitcnt_t bits,
ulong * exp, fmpz_t coeff, const fmpz_mpoly_ctx_t ctx)
{
slong N = mpoly_words_per_exp(bits, ctx->minfo);
fmpz_mpoly_struct * B = (fmpz_mpoly_struct *) poly;
slong NB = mpoly_words_per_exp(B->bits, ctx->minfo);

if (index < B->length)
{
if (bits == B->bits)
mpoly_monomial_set(exp, B->exps + index*N, N);
else
mpoly_repack_monomials(exp, bits, B->exps + index*NB, B->bits, 1, ctx->minfo);
fmpz_set(coeff, B->coeffs + index);
} else
fmpz_zero(coeff);
}

void fmpz_mpoly_abstract_add(
void * A,
void ** Blist,
const slong Blen,
const flint_bitcnt_t bits,
const fmpz_mpoly_ctx_t ctx,
void (* input_function)(void * poly, ulong index, const flint_bitcnt_t bits,
ulong * exp, fmpz_t coeff, const fmpz_mpoly_ctx_t ctx),
void (* output_function)(void * poly, slong index, const flint_bitcnt_t bits,
ulong * exp, fmpz_t coeff, const fmpz_mpoly_ctx_t ctx))
{
slong i,j,k;
slong N;
ulong * cmpmask;
slong heaplen;
ulong * exps;
fmpz * coeffs;
ulong * Blength;
int first;
TMP_INIT;

TMP_START;

if (input_function == NULL)
input_function = _default_input_function;
if (output_function == NULL)
output_function = _default_output_function;

N = mpoly_words_per_exp(bits, ctx->minfo);
cmpmask = (ulong *) TMP_ALLOC(N*sizeof(ulong));
mpoly_get_cmpmask(cmpmask, N, bits, ctx->minfo);

for (k = 0; ; k++) /* round up numterms to a power of 2 */
{
if (Blen <= (1<<k))
break;
}

/* (1<<k) is the number of input slots we need at the end of the heap */
/* (2<<k) is the required size of the heap */

heaplen = 2<<k;

coeffs = (fmpz *) TMP_ALLOC(heaplen * sizeof(fmpz));
exps = (ulong *) TMP_ALLOC(heaplen * N * sizeof(ulong));

for (i=0; i<heaplen; i++)
{
fmpz_init(coeffs + i);
}

Blength = (ulong *) TMP_ALLOC(Blen * sizeof(ulong));

/* start by filling in the back half of the heap with data */

for (i=0; i < Blen; i++)
{
Blength[i] = 0;
input_function(Blist ? Blist[i] : NULL, Blength[i] ++, bits, exps + N*(heaplen/2 + i), coeffs + (heaplen/2 + i), ctx);
}

/* now run the heap algorithm on the front half to fill it in */

for (k = heaplen/2 - 1; k > 0; k--)
{
i = k;
while ((j = HEAP_LEFT(i)) < heaplen)
{
if (fmpz_is_zero(coeffs + j)
|| (!fmpz_is_zero(coeffs + j + 1)
&& !mpoly_monomial_gt(exps + N*j, exps + N*(j+1), N, cmpmask)))
j ++;
fmpz_set(coeffs + i, coeffs + j);
mpoly_monomial_set(exps + N*i, exps + N*j, N);
i = j;

if ((j >= heaplen/2) && (j - heaplen/2 < Blen) && !fmpz_is_zero(coeffs + j))
input_function(Blist ? Blist[j - heaplen/2] : NULL, Blength[j - heaplen/2] ++, bits, exps + N*j, coeffs + j, ctx);
}
}

/* output poly index starts at -1, will be immediately updated to 0 */
k = -WORD(1);

while (! fmpz_is_zero(coeffs + 1))
{
k ++;

mpoly_monomial_set(exps + N*0, exps + N*1, N);

first = 1;

while (!fmpz_is_zero(coeffs + 1) && mpoly_monomial_equal(exps + N*0, exps + N*1, N))
{
if (first)
fmpz_set(coeffs + 0, coeffs + 1);
else
fmpz_add(coeffs + 0, coeffs + 0, coeffs + 1);

first = 0;

/* pop lead item from heap and propagate items through the heap */
i = 1;

while ((j = HEAP_LEFT(i)) < heaplen)
{
if (fmpz_is_zero(coeffs + j)
|| (!fmpz_is_zero(coeffs + j + 1)
&& !mpoly_monomial_gt(exps + N*j, exps + N*(j+1), N, cmpmask)))
j ++;
fmpz_set(coeffs + i, coeffs + j);
mpoly_monomial_set(exps + N*i, exps + N*j, N);
i = j;
}

FLINT_ASSERT((i >= heaplen/2) && (i < heaplen));
if (! fmpz_is_zero(coeffs + i))
input_function(Blist ? Blist[i - heaplen/2] : NULL, Blength[i - heaplen/2] ++, bits, exps + N*i, coeffs + i, ctx);
}

if (fmpz_is_zero(coeffs + 0))
k --;
else
output_function(A, k, bits, exps + 0, coeffs + 0, ctx);
}

output_function(A, -WORD(1), bits, NULL, NULL, ctx);

for (i=0; i<heaplen; i++)
{
fmpz_clear(coeffs + i);
}

TMP_END;
}

/*
* output_function is called every time we're ready to output a term
*
* what does it do if the output buffer is full?
*
* input_function
*
* sets pointers to a block of exponents and coefficients and returns the length
*
* what does it do if the input buffer is empty?
* block?
* return 0?
*
* how do we know when a block of exponents and coefficients can be freed? who frees it?
*
* simplified input_function
*
* copies an exponent vector and a coefficient into the provided pointer arguments
*
* chain of abstract operations
*
* [mul] -\
* \
* [mul] ---+- [add] -- [buffer] -+- [mul] -- [write]
* / /
* [mul] -/ [poly] -/
*
*
* highest priority is the final write; always run it if we're able
* then run final mul if it's got valid input
* then run add if it's got valid input
* then run first mul functions until they've buffered enough to run the add,
* prioritizing the ones with the least data in their output buffers
*
* array of structures with block processing functions and pointers to their input and output buffers
*
* another possibility it to use threading at the Python level. Once the add's output buffer is sufficiently
* full, start another thread to execute the final mul
*/
Loading