Add limits to the size of the string repetition multiplier#23561
Add limits to the size of the string repetition multiplier#23561richardleach merged 1 commit intoPerl:bleadfrom
Conversation
953cf3f to
3725af9
Compare
|
One of the mottos of perl, at least when I picked up the book, was "no internal limits". Just to get them out of the way, I'm going to front load these:
More constructively:
|
|
Thanks for the feedback, @guest20. I can see that the discussion might have to balance what people could conceivably do with improving the guardrails around the patterns of usage we can observe on CPAN / other public code repositories.
More than
Yes, the user doing something funky with magic is definitely worth considering. This PR is checking for a right operand that is Perhaps the code should check for a |
Care to share details of the platform you're running on to help me understand use cases better?
Ah, not magic in the Last time I grepped CPAN for this, there was definitely usage where it seemed like people would expect the whole string - e.g. for preparing a buffer or some kind of initialization - and not some iterator behaviour. Maybe that would be more of a feature request for a separate operator?
That's what i use my RAM for. |
What would you be looking for to resolve those tickets or declare them "wontfix"? Both of those tickets are about constant folding producing huge strings, possibly in rarely-taken or even never-taken branches, and that memory use being undesirable. The options suggested there seemed to be:
|
|
(I've pushed a commit changing the DIE to a warning, in case that's helpful to the discussion.) |
fdc3bd8 to
8fa7f8e
Compare
Well, to maintain back-compat it'd have to be full of tie-magic so the caller got the corresponding face full of bytes when they |
#20586 was already addressed by #20595. I think the OP of #13793 is addressed generally by copy on write as modified by #20595. This change does address the point @iabyn makes in #20586 (comment) where constant folding of the repetition operator can result in large SVs that last the lifetime of the program, even if they're unused (something I've had to workaround occasionally). The original error from #13324 has been addressed by integer overflow checks added over the years: and by the Out of memory you get when trying to allocate beyond the memory available: though the error reporting could probably be better. |
|
CW: my knowledge of guts is all from rumour and hearsay (possibly also from heresy) @tonycoz What issue does the thing being in ... constant folded space... cause that the same large string being in regular-old scalar wouldn't cause? |
I still like the idea of at least warning at compile time that the repetition is likely to be fatal. |
op.c
Outdated
| } | ||
| } else { | ||
| NV rhs = 0.0; rhs = SvNV_nomg(constsv); | ||
| if (rhs >= (NV)((SIZE_MAX >> 2) +1) ) { |
There was a problem hiding this comment.
should this be a -1.0 for safety? It can really questionable what the "53rd digit" to the right side of the . is, and what CC, what CPU, what OS, which security or spectre patch for your OS and CC, and make month and year of all 4 please.
I don't trust C's float/double keyword's rounding modes at all, and were constant folding intermediate values done in FP CPU real instructions or C abstract machine instructions, calculations done at 32, 64, or 80 bit or 128 bit intermediate floating point precision?
the goose has been cooked at malloc(2 GB) or malloc(2GB-1 byte) either way. thats not a future bug ticket.
There was a problem hiding this comment.
Also don't forget that Intel 64/AMD 64 in 64 bit mode CPUs are incapable of doing 80 bit floating pointer intermediate math unlike 32 bit mode. So >= 2^53 or >= 2^52 starts introducing more and more "error" or rounding into the math formula, and we have a 64 bit memory space on paper (more like 48 bits unless your a rack server of brand new Xeons, which I think finally took another chomp at the AMD64 ISA's central address space gap).
There was a problem hiding this comment.
I'm absolutely open to suggestions for alternative comparison values that seem portable and sensible.
t/op/repeat.t
Outdated
|
|
||
| # [GH #13324] Perl croaks if a string repetition seems unsupportable | ||
| fresh_perl_like( | ||
| 'use warnings; my $x = "A" x (2**99)', |
There was a problem hiding this comment.
can we try some maximum toxic 0.0 NV literals here? maybe creating them with PP pack and or unpack.
There was a problem hiding this comment.
my $x = "A" x 0.0; passes this into S_fold_constants:
SV = NV(0x5648c688e0a8) at 0x5648c688e0c0
REFCNT = 1
FLAGS = (NOK,READONLY,PROTECT,pNOK)
NV = 0
That statement constant folds to my $x = '';. Is that the behaviour you wanted to check? If not, please clarify.
The problem is if I have code like: The "abc" x 50_000_000 is constant folded into a 150MB SV and kept in the OP[1] tree even though that SV might never be used. Without constant folding the 150MB SV would only be created when that else runs. It is possible to workaround this, since perl's optimiser doesn't propagate assignments into constant folding[2]: I expect this type of very large constant folded string to be rare, but it can be a surprise if you're monitoring your process memory usage (or trying to diagnose memory issues once they happen.) An alternative might be to warn on large constant folds but that seems like a "it hurts when I do that (constant fold to large strings)" rather than a solution (don't do that). [1] later moved to the pad in threaded builds, and duplicated on thread creation |
Is there any API rule that would not allow rewriting threaded perl's Why not constant fold it the 1st time it executes, then leave it forever in the OP tree? But if its not I think it might be technically impossible to really delete I've done experiments with It has to be done at the |
I think it's a possibility. But for the large SVs we're talking about here it leads to a related problem. If the code executes once, and the user cleans up their copy of the SV (eg [1] compare |
dc446e5 to
8e65013
Compare
|
There seemed very little enthusiasm for warning users about "unrealistically" large multipliers at compile time and questions remained on how to safely/portably check for those anyway, so I won't pursue that part further. This PR has been reworked to just prevent constant folding when the multiplier is larger than a threshold baked into |
ab3d9c7 to
ad76ae7
Compare
ad76ae7 to
de76b67
Compare
0569fb9 to
bdc20c3
Compare
Historically, given a statement like `my $x = "A" x SOMECONSTANT;`, no examination of the size of the multiplier (`SOMECONSTANT` in this example) was done at compile time. Depending upon the constant folding behaviour, this might mean: * The buffer allocation needed at runtime could be clearly bigger than the system can support, but Perl would happily compile the statement and let the author find this out at runtime. * Constants resulting from folding could be very large and the memory taken up undesirable, especially in cases where the constant resides in cold code. This commit addresses the second case by adding a folding threshold for constant multipliers.
bdc20c3 to
3a3b618
Compare

Historically, given a statement like
my $x = "A" x SOMECONSTANT;, no examination of the size of the multiplier (SOMECONSTANTin this example) was done at compile time. Depending upon the constant folding behaviour, this might mean:This commit adds some compile time checking such that:
Things could obviously still go bad at runtime when the multiplier isn't a simple
constant, but that's not for this PR.
Closes #13324, closes #13793, closes #20586.
Besides general correctness checking, the arbitrary cut-off numbers are up for discussion, and please could reviewers suggest any improvements to the_perldiag.pod_ that come to mind.