Skip to content

Conversation

khwilliamson
Copy link
Contributor

This option has been broken for a long time, since at least when we started having inline functions. This p.r. gets it working again, but further efforts are needed for it to be useful.

In part, I have adapated an idea from Tony Cook that is a new angle on an approach I had previously considered and found unworkable.

It works by creating a new file long_names.c that is automatically generated by regen/embed.pl. This file contains the API elements that normally are macros that don't have a separate long name. This replaces the use of mathoms.c for the same purpose, but which was of use only in limited circumstances.

If someone wants to use a long name, and it isn't already in the source, embed.pl makes sure it is here, and hence can be linked to.

The more interesting case is when NO_SHORT_NAMES is defined, this file serves as the place for the linker to find the long names.

perl.h is changed to #include "embed.h" in two places. The first one works as previously. For the second #include, it defines a symbol that embed.h looks for. If it finds it works the opposite of the other way. It undefines the symbols that it previously defined, provided PERL_NO_SHORT_NAMES is defined. It it isn't defined, the second include is a no-op.

When not a no-op, after the second include those symbols are inaccessible to the code.

The trick is that the second #include comes just after the includes of the inline functions headers. That means the inlined functions always have full access to the short names. But those definitions are gone for code that comes after perl.h finishes.

This p.r. addresses a comment from @Leont in #23458 (comment)

This p.r., however hides too many short named functions. Right now, you have to say Perl_newSV(aTHX) instead of newSV(). There has to be some core of old API elements that everyone wants to continue using as short names. A new embed.fnc flag, say 'L', could be added to their entries in that file to indicate to not hide its short name.

The other thing it doesn't address is short names that have to be macros because of problematic parameters. hv_stores() is one such, because one of its parameters is a literal string, which just doesn't work with a function prototype. There are ways to automatically handle these cases, but this p.r. doesn't do that.

A future direction would be to parse all the top-level header files and to cause all the definitions that are not part of the public API to be removed from the user's namespace regardless of PERL_NO_SHORT_NAMES. Thus we would stop inadvertently polluting it.

  • This set of changes requires a perldelta entry, and I need help writing it.

@khwilliamson khwilliamson added the Use merge commit Don't merge this p.r. from github It contains multiple related commits. Instructions in perlgit label Oct 15, 2025
@khwilliamson khwilliamson force-pushed the longnames branch 2 times, most recently from 5ffadde to 7de52ee Compare October 16, 2025 23:50
This simple change makes the perl core compile even with this flag.
Prior to this commit, a commented-out prototype was created for all
macros that we have argument and return type information for.  This
might be useful information for a reader of proto.h.

This commits stops doing this for private macros.  It makes a future
commit slightly easier, and I'm unsure of the usefulness of this anyway.
But I could be persuaded otherwise.
@khwilliamson khwilliamson force-pushed the longnames branch 2 times, most recently from af6e2f4 to 19d002c Compare October 18, 2025 04:08
Where 'weird' is defined as meaning something where the normal rules
don't apply, so something we generate is unlikely to be correct.

This only affects one element, and it uses aTHX in a way that is
incompatible with it being automated.
These keywords all need another word to indicate the parameter type.
Previously only 'struct' was considered to have.

This changed showed an error in one entry embed.fnc, which is also
corrected in this commit.
These macros are not for external use, so don't need a Perl_ prefix
My Linux man page says the arguments to this function should be const,
but the compiler refuses to compile it when so.  Easiest to just cast
them away; the function isn't going to change them.
This is required for the next few commits that start automatically
creating long Perl_name functions for the elements in embed.fnc that are
macros and don't already have them in the source.

Only macros can take a parameter that has to be a literal string, so
don't fit with the next few commits.  This is the only case in embed.fnc
like that, so I'm deferring dealing with it for now.
A function can't return something of that type, but this has always been
a macro, so this hasn't been caught.
These are macros which will have functions created for them in a future
commit
This option has been broken for a long time, since at least when we
started having inline functions.  This commit gets it working again, but
further efforts are needed for it to be useful.

In part, I have adapated an idea from Tony Cook that is a new angle on
an approach I had previously considered and found unworkable.  But his
idea solves the problems I had found.

It has two main components.

1) Prior to this p.r., long names for macros were #defined to just
   expand to that macro (plus potential compensating for the thread
   context parameter).  That meant these would completely fail under
   NO_SHORT_NAMES, as pointed out in
   Perl#23458 (comment)

   Now, a new file 'long_names.c' is automatically generated by
   regen/embed.pl.  This file contains the API elements that
   are macros that don't have a separate long name.  (mathoms.c had
   before 5.43 been used for this purpose, but only for a limited number
   of macros.)

   As a result, there is a long-name function automatically generated
   for every API macro.  It becomes much easier to convert a macro to a
   function and vice versa.

   This also means that when NO_SHORT_NAMES is defined, there already is
   a long name available for every API element.

2) To handle NO_SHORT_NAMES, perl.h is changed to #include "embed.h" in
   an extra place.  The existing #include works as previously.

   The new #include is wrapped with #defining a symbol that embed.h has
   been changed to look for, as well as for PERL_NO_SHORT_NAMES.

   If the new symbol isn't defined, embed.h works just as before.  This
   is what happens in the first #include.

   If the new symbol is defined, but PERL_NO_SHORT_NAMES isn't, embed.h
   expands to nothing.  This is the usual scenario, and effectively it
   means this behaves as if there were just a single #include of
   embed.h, as has always been the case.

   But if someone has #defined PERL_NO_SHORT_NAMES in their XS code
   before #including perl.h, this expansion of embed.h undefines all the
   symbols the first expansion had defined.  Thus the XS code has no
   access to those short names, and they don't pollute its name space.

   The XS code needs to #include perl.h very early in its source, before
   it starts #defining its own symbols.

   The trick that makes this work for inline functions is that the
   second #include in perl.h comes just after the #includes of the
   inline functions headers.  (Those #includes continue to come after
   the #includes of everything they need to have defined.)  That means
   the inlined functions always have full access to the short names.
   But those definitions are gone for code that #defines
   PERL_NO_SHORT_NAMES and which comes after perl.h finishes.

This p.r. however hides too many short named functions and macros.
Traditionally, for example, newSV() is always called with the short name
version, and is not considered to be a name space pollutant.  A list of
such API elements should be generated to serve as exceptions.  A new
embed.fnc flag, say 'L', could be added to their entries in that file to
indicate to not hide their short names.

The other thing it doesn't address is short names that have to be macros
because of problematic parameters.  hv_stores() is one such, because one
of its parameters is a literal string, which just doesn't work with a
function prototype.  There are ways to automatically handle these cases,
but that is for the future.

The next step after this would be to take the non-API elements listed in
embed.fnc, and #undef them unconditionally outside core.  We would then
be able to add whatever names we want for internal use, without fear of
clashing with outside uses.

A future direction would be to parse all the top-level header files and
to add an #undef for every macro #defined in them that aren't API.  That
would allow us to have shorter, more readable, macro names for our use,
without having to consider name space pollution.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Use merge commit Don't merge this p.r. from github It contains multiple related commits. Instructions in perlgit

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant