Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow JITServer AOT cache stores and loads without a local SCC #18301

Merged
merged 16 commits into from
Feb 20, 2024

Conversation

cjjdespres
Copy link
Contributor

@cjjdespres cjjdespres commented Oct 18, 2023

These changes finish resolving #16721 and allow clients of a JITServer to request JITServer AOT cache load or stores regardless of whether or not a client has a local SCC available. The new implementation (which bypasses the local SCC entirely) is controlled by a new option -XX:[+|-]JITServerAOTCacheIgnoreLocalSCC, which is off by default.

For a high level overview of the changes in their entirety, see #16721 (comment). In a future PR I will expand that comment into a description of the JITServer AOT cache for the doc/ folder.

@cjjdespres
Copy link
Contributor Author

Attn @mpirvu.

@mpirvu mpirvu self-assigned this Oct 18, 2023
@mpirvu mpirvu added the comp:jitserver Artifacts related to JIT-as-a-Service project label Oct 18, 2023
@mpirvu
Copy link
Contributor

mpirvu commented Oct 18, 2023

Indeed, clients requesting JITServer AOT loads for particular methods will ignore their local SCC entirely during these requests.

Is this only temporary? Loading from local AOT is faster than loading from the server.

@cjjdespres
Copy link
Contributor Author

I meant that they would ignore it while attempting the JITServer AOT load and relocation. The decision to load from the local SCC happens before all of this, I think, and I shouldn't need to change that.

* ROMClasses.
*/
if (context->isJITServerAOTCacheRequested()) {
fixReturnBytecodes(_portLibrary, (J9ROMClass *)romClassBuffer);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pshipton It was suggested that you might know more about fixReturnBytecodes. Here I want to use it even when a local SCC is not available. Is this safe to do, or is fixReturnBytecodes not applied to non-shared classes for reasons other than efficiency?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit fuzzy on the details. I think it happens anyway but when storing a class into the shared cache it needs to happen up front because the classes become read-only. @gacholio or @TobiAjila ?

Copy link
Member

@pshipton pshipton Oct 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling fixReturnBytecodes() does have the cost of evaluating all the bytecodes to figure out what the return type should be. I think outside the shared cache it just modifies it on the fly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Classes in the SCC are read-only, so they must have fixReturnBytecodes done before storing. It's safe to call it when classes are not going into the SCC, but it's unnecessary - the return bytecodes are fixed up on the fly (we assume the bytecodes are "correct" due to verification).

@cjjdespres
Copy link
Contributor Author

Just to review the changes needed in the deserializer (which I'll put in the design document too):

Currently, when serving a cached JITServer AOT method, the server will send the method itself (including all of the normal relocation records generated for it) and all the JITServer AOT cache records associated with the method. These JITServer AOT cache records are effectively used to "relocate the relocation records" (as Irwin put it). After the client receives this data, the deserializer at the client will:

  1. Take JITServer AOT cache records and look up the entities that they refer to. Thus a ClassSerializationRecord will be used to find a J9Class, a ClassLoaderSerializationRecord will be used to find a J9ClassLoader, and so on. This step contains all of the JITServer AOT cache record validation checks, making sure that, e.g., the J9Class has a ROMClass hash and loader that match the server's records.
  2. Find (or store) those entities in the local SCC and retrieve their offsets. These offsets are effectively opaque uintptr_t keys that identify these entities within the local SCC.
  3. Update the normal relocation records sent along with the cached method so that the offsets they contain are the ones retrieved in (2). This is the relocation-relocation part. (Nothing else in the relocation records needs to be relocated).

After this, the relocation runtime can relocate the method as usual, as the offsets in the record will now be valid with respect to the local SCC.

(A side note at this point: Alexey pointed out that we could just use the JITServer AOT cache record ids as offsets when compiling a method at the server for storage in the JITServer AOT cache. This would let us skip step (3) above in conjunction with the deserializer changes I talk about below, once I get to modifying the code to allow for fresh JITServer AOT cache compilations without a local SCC.)

What we instead need to do in the deserializer is:

  1. Repeat step 1 above
  2. Assign these entities "deserializer offsets" that will be valid for the duration of a client-server connection, and will not be valid with respect to the local SCC. I think the idAndType of the JITServer AOT cache records themselves can be used as offsets at this point, anticipating the future changes to fresh JITServer AOT cache compilations. Previous suggestions included having the offsets come from some incrementing counter, or using pointers to the resolved entities directly as their offsets.
  3. Repeat step 3 with these assigned offsets

Once that is done, the relocation work can proceed, which will involve creating a class derived from TR_J9SharedCache that we can pass to the relocation runtime. This shared cache interface will consult the deserializer and so be able to convert between the deserializer offsets and normal runtime entities.


if (offset)
auto *vmInfo = TR::compInfoPT->getClientData()->getOrCacheVMInfo(stream);
if (offset || (aotCacheLoad && !vmInfo->_hasSharedClassCache))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This aotCacheLoad parameter that I use here should only be temporary - when we ignore the local SCC during fresh JITServer AOT cache compilations this check will need to be updated, because there will be no local offset to store.

@cjjdespres
Copy link
Contributor Author

One problem that I've observed with my solution to the ROM classes differing is that if a local SCC is created without -XX:+JITServerUseAOTCache the ROM classes in it will be represented differently than the no-SCC case or the case in which we create the local SCC while using the JITServer AOT cache. That's because my changes only take effect if we're a client intending to use the JITServer AOT cache. Thus ROMClasses loaded from the SCC will still differ in hash, and we will get JITServer AOT cache misses for any class that was loaded from the local SCC.

I haven't checked exactly how they differ, but I assume it's the debug information again. This isn't immediately a problem, I suppose - the client just has to run with a local SCC that was created with some option that forces a consistent ROM class representation (as -XX:+JITServerUseAOTCache now does) if it wants to use existing cached methods. Of course that's still an annoying requirement.

If the debug info being inlined or not is irrelevant for correctness, then we could solve this by convert ROM classes to a known, consistent representation during comparison so we don't have to deal with this issue. There's already JITServerHelpers::packROMClass that does this to an extent, by removing intermediate class data and adding interned strings to the end of the ROM class. Maybe the debug info could be moved to a consistent location in the ROM class at this point as well?

If the debug info being inlined or not is relevant (and so we don't want ROM classes that differ only in this respect to be equivalent to each other) then we either have to tolerate this problem or find a way of allowing debug info to be stored out-of-line when no local SCC exists. We would do this so that the ROM classes would still have a uniform representation, just one with debug info that's always out of line when using the AOT cache (instead of always inline as I have it in the draft PR). I did see in the code a comment mentioning that this can be done "when the allocation strategy [of the ROMClassCreationContext] permits it", so that might be a possibility as well.

I should note that packROMClass does not inline any out-of-line debug info for comparison at the moment; it merely zeroes out all SRPs associated with such debug info. At the very least that suggests that the actual content of the debug info is irrelevant for correctness.

@cjjdespres
Copy link
Contributor Author

cjjdespres commented Nov 8, 2023

Force pushed to rebase onto master to incorporate the changes in #18344. I also changed a few things so that the PR works when a readonly cache is present:

  • The class loader table now tracks the names of first-loaded classes even if a chain cannot be found - before it would only track names without chains if a local SCC didn't exist (commit e0aaad9f2abc71d2670f8688c930be67821f5b28)
  • AOT loads will be requested from the server even if a particular method couldn't be compiled locally with AOT (commit a715e17ba56ed3396963616cded7aabc24ba340e)
  • I simplified a check in getClassRecord and removed the aotCacheLoad tracking that I had previously introduced - after looking into it I think what I had before was unnecessary (commit 8645b46268b03076260cbff2e6aba02dbd53ed88)
  • During testing I found that the SharedCache_getClassChainOffsetIdentifyingLoader message can be sent to the client during an AOT cache load without a sharedCache() in the client compilation's frontend, so I had to handle that case (commit eb02219018d337a4b4b204e2d8befed68af58b6f)

@cjjdespres
Copy link
Contributor Author

Force pushed to remove one instance of aotCacheLoad I had forgotten about.

TR::Monitor *const _methodMonitor;

PersistentUnorderedMap<uintptr_t/*ID*/, uintptr_t/*SCC offset*/> _classChainMap;
PersistentUnorderedMap<uintptr_t/*ID*/, uintptr_t * /*deserializer chain*/> _classChainMap;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These chains (and the ones in the well-known classes map) could just be vectors, I suppose.

@cjjdespres
Copy link
Contributor Author

I've pushed the deserializer changes, and the start of the JITServer AOT cache documentation in doc/.

@AlexeyKhrabrov The actual AOT deserializer changes were more or less as discussed in the related issue, including (perhaps temporarily) removing support for reloading of classes and such.

throw std::bad_alloc();

deserializerChain[0] = chainSize;
memcpy(deserializerChain + 1, record->list().ids(), record->list().length());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect, incidentally. The idAndType should be written to the chain, not the IDs themselves.

@cjjdespres
Copy link
Contributor Author

The latest changes implement the alternate SCC frontend that ignores the local SCC (if present) and instead looks up the information cached by the deserializer in response to particular queries.

The last thing that potentially needs to be done before this is ready is to deal with the ROMClass problem that I mentioned in this comment.

@cjjdespres
Copy link
Contributor Author

There is one other thing - a deserializer reset (which happens when a client disconnects from a server) can happen concurrently. We have to make sure that such a reset cannot occur during relocation, or we have to detect during relocation that such a reset happened and abort relocation if it did.

@cjjdespres
Copy link
Contributor Author

cjjdespres commented Nov 22, 2023

I'm not especially happy with AOTCache.md, though it is preliminary. What I want to convey is something like:

  1. The JITServer AOT cache is a reimplementation of what the local SCC tries to do, which is to use its records to identify RAM classes and RAM methods across JVM invocations, and to encode this information in offsets (opaque uintptr_t keys valid only with respect to a particular local SCC) inside a method's relocation records. The relocation runtime can then use these offsets to materialize the relevant entities during relocation (if they can be found in the current JVM).
  2. The old deserializer implementation does all the work of materializing matching RAM classes and methods, but then uses them to store appropriate information (ROM classes, local SCC class chains, etc.) in the local SCC and update the offsets in the method's relocation records to refer to local SCC offsets. This allows the relocation runtime to find the information it needs locally.
  3. The new deserializer implementation doesn't touch the local SCC. Instead, it caches the entities it materializes, and updates the offsets in a method's relocation records with its own "deserializer offsets". We then pass in a new TR_J9SharedCache subclass during relocation of these methods that ignores the local SCC, and instead uses those offsets to look up the cached entities in the deserializer directly.

Once I finish with the fresh compilation portion of this issue, the deserializer won't have to update the offsets in a method at all, as the server should simply be able to use the idAndType of the relevant AOT cache records as offsets directly in a method's relocation records. (The idAndType is what I'm currently using for "deserializer offsets").

Copy link
Contributor

@AlexeyKhrabrov AlexeyKhrabrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a proper review, just some notes and a few things I noticed in the code.

The actual AOT deserializer changes were more or less as discussed in the related issue, including (perhaps temporarily) removing support for reloading of classes and such.

I haven't had a chance to look at the code in detail (and won't have time for it for a while). At a high level, how does the new implementation behave when a class is unloaded and later loaded again? By the way, I've seen applications (e.g. some workloads in the Renaissance benchmark suite) where such class re-loading does seem to happen (although I'm not sure how much of the overall performance the affected methods are responsible for).

There is one other thing - a deserializer reset (which happens when a client disconnects from a server) can happen concurrently. We have to make sure that such a reset cannot occur during relocation, or we have to detect during relocation that such a reset happened and abort relocation if it did.

This existing mechanism and its API turned out to be rather awkward when building other things on top of the deserializer (in my current work on early JIT compilation). This PR might be a good opportunity to get rid of it rather than support it in new code. One likely better mechanism I can think of is a reader-writer lock where threads performing AOT deserialization+load are readers and threads doing a reset are writers. Although I haven't figured the details. If we do keep the current reset detection mechanism, throwing an exception instead of propagating the wasReset flag would probably be a better API for it.



#if (defined(J9VM_ARCH_X86) || defined(J9VM_ARCH_S390) || defined(J9VM_ARCH_POWER) || defined(J9VM_ARCH_AARCH64)) && defined(J9VM_OPT_JITSERVER)
// Regardless of whether or not the local SCC is enabled, we still need to enable portable AOT if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? One of the advantages of JITServer AOT cache is transparent handling of different client environments without sacrificing throughput for portability. I believe this should still be at least configurable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change was @mpirvu's suggestion. It was in the context of a discussion we were having about the TR_AOTHeader differing depending on whether or not -Xshareclasses was enabled or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also still in. I've made a note to revisit this decision. We could at least in future add a test of whether or not the new implementation is being used, and to enable the portable shared cache flag only if it is. I can't quite remember exactly what relo header differences I was getting, but I do remember that J9_EXTENDED_RUNTIME2_ENABLE_PORTABLE_SHARED_CACHE wasn't necessary to eliminate them all - it was only sufficient. So it might not be necessary to enable even in the new implementation if forcing portable AOT is considered that bad. But I'd have to look at what exactly was going wrong again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be fixed eventually, it's a performance degradation. I remember seeing measurable reductions in throughput with portable AOT back when I worked on the original AOT cache implementation. There is also no point in enabling portable AOT by default at all when running in a container without local SCC (but with AOT cache). Could you please open an issue for this?

I still don't understand what could be different in the TR_AOTHeader depending on whether local SCC is enabled. Maybe there is something subtle going wrong with either featureFlags or processDescription. Have you tried testing this both inside and outside a container, and with portable AOT explicitly enabled or disabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I had individual overrides in particular places, because having/not having an SCC would change some VM properties at startup. Those solved all the header incompatibilities. Explicitly setting the relocatable processor/CPU description in rossa.cpp was one of them. Changing how the compressed pointer shift amount was determined was another, as that changes if we're in containers and using an SCC (from what I remember).

Setting the portable AOT flag happened to fix all those individual issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll open up an issue

@@ -366,6 +367,14 @@ class ROMClassCreationContext
}

bool canPossiblyStoreDebugInfoOutOfLine() const {
if (isJITServerAOTCacheRequested()) {
/*
* If the JITServer AOT cache might be used then debug information should be forced inline
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this change interact with zeroing out SRPs to debug info entries in JITServerHelpers::packROMClass()? If debug info is always stored inline, may as well keep SRPs to it in serialized ROMClasses. Might also help with the problem described here #18301 (comment).

@@ -34,30 +34,54 @@ enum TableKind { Loader, Chain, Name };
// struct is linked into three linked lists - one for each hash table in TR_PersistentClassLoaderTable.
struct TR_ClassLoaderInfo
{
TR_PERSISTENT_ALLOC(TR_Memory::PersistentCHTable)
// Deleting because these records are variably-sized
TR_ClassLoaderInfo(const AOTSerializationRecord &) = delete;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong parameter type.


void *const _loader;
TR_ClassLoaderInfo *_loaderTableNext;
void *const _chain;
TR_ClassLoaderInfo *_chainTableNext;
#if defined(J9VM_OPT_JITSERVER)
TR_ClassLoaderInfo *_nameTableNext;
size_t _nameLength;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could store it as J9UTF8 * instead to avoid the copy when there is a local SCC and the 1st loaded ROMClass with its name string is stored there. Might also simplify some code.

// Delay relocation by default, unless this option is enabled
if (!comp->getOption(TR_DisableDelayRelocationForAOTCompilations))
return false;

#if defined(J9VM_OPT_JITSERVER)
if (comp->isDeserializedAOTMethod() && comp->getPersistentInfo()->getJITServerAOTCacheDelayMethodRelocation())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the command line option be removed then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to update this - I haven't removed the option in this final version of the PR, because we still have the option of running with the old implementation.

int32_t numSuperclasses = TR::Compiler->cls.classDepthOf(fe()->convertClassPtrToClassOffset(clazz));
int32_t numInterfaces = numInterfacesImplemented(clazz);
int32_t numSuperclasses = fe()->numSuperclasses(clazz);
int32_t numInterfaces = fe()->numInterfacesImplemented(clazz);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this log message is worth iterating through interfaces twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since numSuperclasses and numInterfaces are used a little later on to fill in the class chain, I got rid of the necessaryClassChainLength call instead.

_chain(chain), _chainTableNext(NULL),
_nameLength(J9UTF8_LENGTH(nameStr)), _nameTableNext(NULL)
{
memcpy(&_nameData, J9UTF8_DATA(nameStr), J9UTF8_LENGTH(nameStr));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary (along with memory allocation) if not using AOT cache.

@@ -127,13 +123,23 @@ class JITServerAOTDeserializer
// looked up using the cached SCC offsets if the class was unloaded.
J9Class *getRAMClass(uintptr_t id, TR::Compilation *comp, bool &wasReset);

std::vector<J9Class *> getRAMClassChain(TR::Compilation *comp, J9Class *clazz);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be a scratch vector or a statically sized buffer (class chain length is limited).

@cjjdespres
Copy link
Contributor Author

Force-pushed to bring in the changes from #18585 and #18568. I also added a commit that strips the debug info from ROM classes and fixes the bytecodes during ROM class packing, so I was able to drop the changes to the VM and ROM class building that I had before. Stripping the debug info out was one solution to the problem I mentioned in #18301 (comment). The implementation ended up being a little trickier than I anticipated.

@AlexeyKhrabrov I'll be getting to your initial remarks next. About the class unloading and loading - the new implementation still marks the class as unloaded in the _classIdMap (as happens with the current implementation). It just doesn't attempt to find a new matching version of the class if we encounter a cached class entry during deserialization that's been marked as unloaded. The same thing happens with invalidated class loaders and methods.

Moving to a reader-writer lock system sounds fine as well. It wouldn't be hard to write one myself, but is there an implementation already in openj9?

@cjjdespres
Copy link
Contributor Author

One thing I've run into in the other half of this issue (code generation at the server) is the use of SCC methods like isROMClassInSharedCache(romClass, offsetOutputPtr) and offsetInSharedCacheFromROMClass(romClass), which expect to be able to look up ROM class data in the SCC given just a J9ROMClass *romClass input. During a compilation intended for storage in the AOT cache, the offset will, after my changes, correspond to the idAndType of the server's class record for the ROM class. We have two ways of getting this record at the server, more or less:

  1. The class record map in the AOT cache, with keys being the pair {J9ROMClass *, AOTCacheClassLoaderRecord *}
  2. The client session data, with keys being (client) J9Class * pointers.

Both sources require more information than simply a J9ROMClass *. I'm fairly sure wherever these functions are used at the server we always have more information on hand than merely a ROM class. Generally when using these kinds of SCC functions we first look up the ROM class associated to a full J9Class *, and then give the resulting J9ROMClass * to these methods. So, I could add functions like offsetInSharedCacheFromClass(clazz) to the SCC (and other classes, like J9::AheadOfTimeCompile) that allow us to communicate this extra information to the actual source of the data. The local SCC can just ignore the extra information, and the server can use it.

A somewhat less invasive change would be to cache some kind of J9ROMClass * -> AOTCacheClassRecord * association. I think that runs into the problem that there could be two loaders with distinct AOTCacheClassLoaderRecords that are responsible for loading distinct RAM classes that have identical underlying ROM classes. The current caches do not have this problem.


This sort of problem will probably arise in the other SCC lookup methods - the server caches generally want more information than what is typically given to these methods. The offsetInSharedCacheFromROMMethod function is another example - the local SCC only needs a ROMClass * to get an offset, whereas the server needs a J9Method * and its defining J9Class *, both of which seem to be available in the only place offsetInSharedCacheFromROMMethod is used.

@mpirvu
Copy link
Contributor

mpirvu commented Feb 20, 2024

jenkins test sanity plinuxjit,xlinuxjit,zlinuxjit,alinux64jit jdk17

@mpirvu
Copy link
Contributor

mpirvu commented Feb 20, 2024

plinuxjit shows the following crash in compiled code:

22:06:43  Parsing /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_0/aqa-tests/TKG/../functional/HealthCenter/playlist.xml
22:06:43  Parsing /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_0/aqa-tests/TKG/../functional/NativeTest/playlist.xml
22:06:43  Unhandled exception
22:06:43  Type=Segmentation error vmState=0x00000000
22:06:43  J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
22:06:43  Handler1=00007FFFBD4E0F50 Handler2=00007FFFBD409760
22:06:43  R0=0000000000000000 R1=00007FFFBDC5D7B0 R2=00000000FFE84A00 R3=00000000FFE790E1
22:06:43  R4=00000000FFE84A00 R5=0000000000000000 R6=00000000FFE790D8 R7=0000000000000458
22:06:43  R8=0000000000000009 R9=0000000000135868 R10=00000000FFE790D8 R11=0000000000176EE0
22:06:43  R12=0000000000000000 R13=00007FFFBDC668E0 R14=000000000017B620 R15=0000000000095800
22:06:43  R16=00007FFF95D30038 R17=00000000FFFB8A38 R18=00000000FFFB95F0 R19=0000000000000000
22:06:43  R20=0000000000002000 R21=0000000000000000 R22=00000000FFE84A00 R23=0000000000000000
22:06:43  R24=00000000FFE790D8 R25=0000000000000458 R26=0000000000000009 R27=0000000000000000
22:06:43  R28=00000000FFE790D8 R29=0000000000000009 R30=0000000000000458 R31=00000000FFE84A00
22:06:43  NIP=00007FFF9C3B4D24 MSR=800000000280F033 ORIG_GPR3=00007FFFB765F544 CTR=00007FFF9C3B4C34
22:06:43  LINK=00007FFF9C3B4A40 XER=0000000000000000 CCR=0000000042884840 SOFTE=0000000000000001
22:06:43  TRAP=0000000000000300 DAR=000000000000045C dsisr=0000000040000000 RESULT=0000000000000000
22:06:43  FPR0 000000000017b8d0 (f: 1554640.000000, d: 7.680942e-318)
22:06:43  FPR1 4054a38320000000 (f: 536870912.000000, d: 8.255488e+01)
22:06:43  FPR2 4008000000000000 (f: 0.000000, d: 3.000000e+00)
22:06:43  FPR3 4030000000000000 (f: 0.000000, d: 1.600000e+01)
22:06:43  FPR4 3fe8000000000000 (f: 0.000000, d: 7.500000e-01)
22:06:43  FPR5 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:06:43  FPR6 3fe62e42fefa39ef (f: 4277811712.000000, d: 6.931472e-01)
22:06:43  FPR7 bfcb31d8a68224e9 (f: 2793547008.000000, d: -2.124587e-01)
22:06:43  FPR8 3b2222206f686365 (f: 1869112192.000000, d: 7.499760e-24)
22:06:43  FPR9 494454524f504552 (f: 1330660736.000000, d: 9.067207e+44)
22:06:43  FPR10 3c07652a40000000 (f: 1073741824.000000, d: 1.585319e-19)
22:06:43  FPR11 000000000002371f (f: 145183.000000, d: 7.172993e-319)
22:06:43  FPR12 000000000017b8d9 (f: 1554649.000000, d: 7.680987e-318)
22:06:43  FPR13 00000000b7a01eea (f: 3080724224.000000, d: 1.522080e-314)
22:06:43  FPR14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:06:43  FPR15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:06:43  FPR16 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:06:43  FPR17 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:06:43  FPR18 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:06:43  FPR19 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:06:43  FPR20 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:06:43  FPR21 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:06:43  FPR22 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:06:43  FPR23 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:06:43  FPR24 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:06:43  FPR25 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:06:43  FPR26 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:06:43  FPR27 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:06:43  FPR28 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:06:43  FPR29 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:06:43  FPR30 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:06:43  FPR31 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:06:43  
22:06:43  Compiled_method=java/lang/Access.encodeASCII([CI[BII)I
22:06:43  Target=2_90_20240219_151 (Linux 4.18.0-535.el8.ppc64le)
22:06:43  CPU=ppc64le (4 logical CPUs) (0x1dbbc0000 RAM)
22:06:43  ----------- Stack Backtrace -----------
22:06:43   (0x00007FFF9C3B4D24 [<unknown>+0x0])
22:06:43  runCallInMethod+0x258 (0x00007FFFBD4BACE8 [libj9vm29.so+0x1ace8])
22:06:43  gpProtectedRunCallInMethod+0x54 (0x00007FFFBD4E5284 [libj9vm29.so+0x45284])
22:06:43  signalProtectAndRunGlue+0x28 (0x00007FFFBD678B68 [libj9vm29.so+0x1d8b68])
22:06:43  omrsig_protect+0x3e4 (0x00007FFFBD40AC14 [libj9prt29.so+0x3ac14])
22:06:43  gpProtectAndRun+0xa8 (0x00007FFFBD678C38 [libj9vm29.so+0x1d8c38])
22:06:43  gpCheckCallin+0xc4 (0x00007FFFBD4E7944 [libj9vm29.so+0x47944])
22:06:43  callStaticVoidMethod+0x48 (0x00007FFFBD4E4888 [libj9vm29.so+0x44888])
22:06:43  JavaMain+0x1210 (0x00007FFFBDF37E60 [libjli.so+0x7e60])
22:06:43  ThreadJavaMain+0x18 (0x00007FFFBDF3E0F8 [libjli.so+0xe0f8])
22:06:43  start_thread+0xf8 (0x00007FFFBDEE9678 [libpthread-2.28.so+0x9678])
22:06:43  clone+0x74 (0x00007FFFBDDD8968 [libc-2.28.so+0x138968])

and

22:04:18  Type=Segmentation error vmState=0x00000000
22:04:18  J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
22:04:18  Handler1=00003FFF9A970F50 Handler2=00003FFF9A899760
22:04:18  R0=0000000000000000 R1=00003FFF9B00D810 R2=0000000000000001 R3=00000000FFEDC840
22:04:18  R4=00000000FFF35F30 R5=00000000FFEDC808 R6=0000000000000001 R7=00003FFF9B00D930
22:04:18  R8=00003FFF9A4E78D0 R9=0000000000141998 R10=0000000000000000 R11=000000000017A960
22:04:18  R12=0000000028882828 R13=00003FFF9B016900 R14=000000000017DBC0 R15=0000000000099200
22:04:18  R16=00003FFF70BF0038 R17=00000000FFD949E8 R18=00000000FFEDC7B8 R19=00000000FFEDC808
22:04:18  R20=00000000FFF35F30 R21=0000000000000000 R22=0000000000000045 R23=0000000000000000
22:04:18  R24=00000000FFF35F60 R25=0000000000000000 R26=00000000FFEDC840 R27=0000000000002000
22:04:18  R28=0000000000000045 R29=00000000FFEDC840 R30=0000000000000000 R31=00000000FFF35F60
22:04:18  NIP=00003FFF733E3DF8 MSR=800000010280F033 ORIG_GPR3=00000000000081C8 CTR=00003FFF733B48F0
22:04:18  LINK=00003FFF733B4A0C XER=0000000020000000 CCR=0000000048882822 SOFTE=0000000000000001
22:04:18  TRAP=0000000000000300 DAR=0000000000000049 dsisr=0000000040000000 RESULT=0000000000000000
22:04:18  FPR0 00000000fff35f30 (f: 4294139648.000000, d: 2.121587e-314)
22:04:18  FPR1 405012b640000000 (f: 1073741824.000000, d: 6.429237e+01)
22:04:18  FPR2 4008000000000000 (f: 0.000000, d: 3.000000e+00)
22:04:18  FPR3 889784c8889784a0 (f: 2291631360.000000, d: -2.849160e-267)
22:04:18  FPR4 4014438bc0000000 (f: 3221225472.000000, d: 5.065963e+00)
22:04:18  FPR5 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:04:18  FPR6 3fe5555560000000 (f: 1610612736.000000, d: 6.666667e-01)
22:04:18  FPR7 3f242ee9c0000000 (f: 3221225472.000000, d: 1.539860e-04)
22:04:18  FPR8 6d6f6320352e3031 (f: 892219456.000000, d: 1.384967e+219)
22:04:18  FPR9 6e6f2064656c6970 (f: 1701603712.000000, d: 9.001140e+223)
22:04:18  FPR10 303120796c754a20 (f: 1819625984.000000, d: 1.479104e-76)
22:04:18  FPR11 3e26064400000000 (f: 0.000000, d: 2.563986e-09)
22:04:18  FPR12 000000000017dd19 (f: 1563929.000000, d: 7.726836e-318)
22:04:18  FPR13 4028000000000000 (f: 0.000000, d: 1.200000e+01)
22:04:18  FPR14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:04:18  FPR15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:04:18  FPR16 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:04:18  FPR17 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:04:18  FPR18 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:04:18  FPR19 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:04:18  FPR20 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:04:18  FPR21 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:04:18  FPR22 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:04:18  FPR23 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:04:18  FPR24 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:04:18  FPR25 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:04:18  FPR26 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:04:18  FPR27 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:04:18  FPR28 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:04:18  FPR29 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:04:18  FPR30 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:04:18  FPR31 0000000000000000 (f: 0.000000, d: 0.000000e+00)
22:04:18  
22:04:18  Compiled_method=sun/nio/cs/UTF_8$Encoder.encodeArrayLoop(Ljava/nio/CharBuffer;Ljava/nio/ByteBuffer;)Ljava/nio/charset/CoderResult;

On zLinuxjit we have:

00:14:07  Exception in thread "main" java.nio.charset.CoderMalfunctionError: java.lang.NullPointerException
00:14:07  	at java.base/java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:587)

and

00:12:14  BUILD FAILED
00:12:14  /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_jit_Personal_testList_1/aqa-tests/TKG/scripts/build_test.xml:95: The following error occurred while executing this line:
00:12:14  /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_jit_Personal_testList_1/aqa-tests/functional/cmdLineTests/J9security/build.xml:117: The following error occurred while executing this line:
00:12:14  /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_jit_Personal_testList_1/aqa-tests/functional/cmdLineTests/J9security/build.xml:74: java.nio.charset.CoderMalfunctionError: java.lang.NullPointerException
00:12:14  	at java.base/java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:587)
00:12:14  	at java.base/sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:293)
00:12:14  	at java.base/sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
00:12:14  	at java.base/sun.nio.cs.StreamEncoder.write(StreamEncoder.java:132)

On xlinuxjit we have

21:53:51  BUILD FAILED
21:53:51  /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_x86-64_linux_jit_Personal_testList_0/aqa-tests/TKG/scripts/build_test.xml:95: The following error occurred while executing this line:
21:53:51  /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_x86-64_linux_jit_Personal_testList_0/aqa-tests/functional/cmdLineTests/shareClassTests/SCHelperCompatTests/build.xml:157: The following error occurred while executing this line:
21:53:51  /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_x86-64_linux_jit_Personal_testList_0/aqa-tests/functional/cmdLineTests/shareClassTests/SCHelperCompatTests/build.xml:136: java.nio.charset.CoderMalfunctionError: java.lang.NullPointerException
21:53:51  	at java.base/java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:587)
21:53:51  	at java.base/sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:293)

and

21:51:46  Parsing /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_x86-64_linux_jit_Personal_testList_1/aqa-tests/TKG/../functional/cmdLineTests/sigabrtHandlingTest/playlist.xml
21:51:46  Exception in thread "main" java.nio.charset.CoderMalfunctionError: java.lang.NullPointerException
21:51:46  	at java.base/java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:587)
21:51:46  	at java.base/sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:293)

@cjjdespres
Copy link
Contributor Author

cjjdespres commented Feb 20, 2024

Just to record it here - the test build failures are likely related to the PR that #18981 reverted. Once this PR is rebased to bring in that revert and the ROM class walk fix, we should be good to try PR testing again.

The new reset detection mechanism renders it unnecessary.

Signed-off-by: Christian Despres <[email protected]>
The -XX:[+|-]JITServerAOTCacheIgnoreLocalSCC option explicitly
enables or disables the use of the local SCC when compiling a method
using the JITServer AOT cache. If disabled, a local SCC must be present
and have sufficient writable space for the JITServer AOT cache to be
used. By default this option is disabled.

Signed-off-by: Christian Despres <[email protected]>
The persistent class loader table will now track the names of the
first-loaded classes of class loaders without associated chains being
present.

Signed-off-by: Christian Despres <[email protected]>
The J9_EXTENDED_RUNTIME2_ENABLE_PORTABLE_SHARED_CACHE flag causes the
VM to make a number of conservative assumptions during code generation,
including the hardware capabilities available. When the JITServer AOT
cache is used, these assumptions should be made regardless of whether
or not a local SCC exists. This allows JITServer AOT cache sharing
to occur between clients that may or may not have a local SCC; their
headers would otherwise differ and so prevent cache hits.

Signed-off-by: Christian Despres <[email protected]>
A JITServer client will now request an AOT load even if a local SCC is
not present. When the relocation mechanism and deserializer have been
modified appropriately, this will allow JITServer AOT cache loads to
occur in that circumstance. Both the server and client will now
recognize AOT cache load requests even if the method is not eligible for
an AOT cache store.

Signed-off-by: Christian Despres <[email protected]>
When ignoring the local SCC, we now relocate deserialized methods using
the new TR_J9DeserializerSharedCache, which consults the underlying
JITServerNoSCCAOTDeserializer for offset queries.

Signed-off-by: Christian Despres <[email protected]>
The _ignoringLocalSCC property is true at a JITServer client when the
compilation is to be stored in the JITServer's AOT cache and the client
is using the new deserializer implementation (hence ignoring its local
SCC). Since the server is responsible for generating relocation and
AOT cache records in these circumstances, this property is used to stop
certain SCC-related checks from being performed and stop relo records
from being generated when it is true.

Signed-off-by: Christian Despres <[email protected]>
A client running with _JITServerAOTCacheIgnoreLocalSCC will now request
AOT cache stores if the compilation is eligible for them. The server
will send the result to the client with a new
AOTCache_storedAOTMethod, and the client will deserialize it using the
new deserializer implementation.

Signed-off-by: Christian Despres <[email protected]>
This saves a little network traffic, and also avoids an assert firing
in ClientSessionData::getClassRecord().

Signed-off-by: Christian Despres <[email protected]>
@cjjdespres
Copy link
Contributor Author

Force-pushed to rebase onto master.

@mpirvu
Copy link
Contributor

mpirvu commented Feb 20, 2024

jenkins test sanity plinuxjit,xlinuxjit,zlinuxjit,alinux64jit jdk17

@mpirvu
Copy link
Contributor

mpirvu commented Feb 20, 2024

zlinux failed the cmdLineTester_criu_jitserverPostRestore_1 with "can't bind server address: Address already in use".
This is a known issue.

@mpirvu mpirvu merged commit 283a3b6 into eclipse-openj9:master Feb 20, 2024
9 of 11 checks passed
// If we are using the JITServer AOT cache and ignoring the local SCC, we need to remember the name of clazz
// with or without chain. Otherwise (not using AOT cache or not ignoring the local SCC) there is no point in continuing
// without a chain.
if (!chain && (!useAOTCache || !_persistentMemory->getPersistentInfo()->getJITServerAOTCacheIgnoreLocalSCC()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't compile without defined(J9VM_OPT_JITSERVER), e.g. on macOS; see failed build:

[2024-02-20T21:51:46.296Z] /Users/jenkins/workspace/Build_JDK22_aarch64_mac_Personal/openj9/runtime/compiler/env/ClassLoaderTable.cpp:239:76: error: no member named 'getJITServerAOTCacheIgnoreLocalSCC' in 'TR::PersistentInfo'
[2024-02-20T21:51:46.296Z]    if (!chain && (!useAOTCache || !_persistentMemory->getPersistentInfo()->getJITServerAOTCacheIgnoreLocalSCC()))
[2024-02-20T21:51:46.296Z]                                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  ^
[2024-02-20T21:51:46.296Z] 1 error generated.
[2024-02-20T21:51:46.296Z] make[6]: *** [runtime/compiler/CMakeFiles/j9jit.dir/env/ClassLoaderTable.cpp.o] Error 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll put a fix shortly

Copy link
Contributor

@AlexeyKhrabrov AlexeyKhrabrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some concerns about changes in control logic, a few questions, and some suggestions for possible future refactoring.

// This record ID can only be missing from the cache if it was removed by a concurrent reset
// TODO: is this true any more? The deserializerWasReset above should guarantee that we haven't
// reset by this point.
wasReset = true;
return V();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an assertion then.

@@ -992,6 +997,11 @@ TR_ResolvedRelocatableJ9Method::TR_ResolvedRelocatableJ9Method(TR_OpaqueMethodBl
{
TR_J9VMBase *fej9 = (TR_J9VMBase *)fe;
TR::Compilation *comp = TR::comp();
#if defined(J9VM_OPT_JITSERVER)
if (fej9->_compInfoPT->getMethodBeingCompiled()->_useAOTCacheCompilation)
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can we skip this when ignoring local SCC?

@@ -1856,7 +1877,7 @@ TR_ResolvedRelocatableJ9Method::createResolvedMethodFromJ9Method(TR::Compilation
isSystemClassLoader)
{
resolvedMethod = new (comp->trHeapMemory()) TR_ResolvedRelocatableJ9Method((TR_OpaqueMethodBlock *) j9method, _fe, comp->trMemory(), this, vTableSlot);
if (comp->getOption(TR_UseSymbolValidationManager))
if (!ignoringLocalSCC && comp->getOption(TR_UseSymbolValidationManager))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can we skip this when ignoring local SCC?

@@ -1039,11 +1039,13 @@ JITServerAOTCache::storeMethod(const AOTCacheClassChainRecord *definingClassChai
"AOT cache %s: method %s @ %s index %u class ID %zu AOT header ID %zu already exists",
_name.c_str(), signature, levelName, index, definingClassId, aotHeaderRecord->data().id()
);
return false;
methodRecord = it->second;
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the logic compared to the previous implementation: we return an existing serialized method instead of the freshly compiled one (which gets discarded). Is this always the right thing to do? For instance, is it possible that the client requests a fresh AOT compilation because it failed to load an existing server-cached version, and now will receive the same bad old version after a (discarded) fresh compilation? After looking at how the various store/load flags are handled, I think it is possible, but I'm not entirely sure.

Also, returning true here contradicts the doc comment in the .hpp file.

@@ -214,6 +214,25 @@ class ServerStream : public CommunicationStream
TR_VerboseLog::writeLineLocked(TR_Vlog_JITServer, "Could not finish compilation: %s", e.what());
}
}
/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: could simply add a msgType parameter to finishCompilation() instead of adding a new method, which would also reduce code duplication at the call sites in outOfProcessCompilationEnd().

@@ -1010,7 +1010,7 @@ JITServerAOTCache::storeMethod(const AOTCacheClassChainRecord *definingClassChai
TR_Hotness optLevel, const AOTCacheAOTHeaderRecord *aotHeaderRecord,
const Vector<std::pair<const AOTCacheRecord *, uintptr_t/*reloDataOffset*/>> &records,
const void *code, size_t codeSize, const void *data, size_t dataSize,
const char *signature, uint64_t clientUID)
const char *signature, uint64_t clientUID, const CachedAOTMethod *&methodRecord)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: methodRecord is a confusing name, there are actual method records.

{
aotCacheStore = entry->_useAOTCacheCompilation &&
compInfo->methodCanBeJITServerAOTCacheStored(compiler->signature(), compilee->convertToMethod()->methodType());
aotCacheLoad = persistentInfo->getJITServerUseAOTCache() &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this changes the semantics of the -Xaot:jitserverAOTCacheStoreExclude= option, which previously implied LoadExclude as well (and still implies for the local SCC mode).

if (aotCacheLoad)
deserializer->incNumCacheMisses();

auto method = SerializedAOTMethod::get(methodStr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the code duplicated in the next branch can be factored out.

@@ -3585,7 +3678,7 @@ remoteCompile(J9VMThread *vmThread, TR::Compilation *compiler, TR_ResolvedMethod
}
}

if (!compiler->getOption(TR_DisableCHOpts) && !useAotCompilation && !compiler->isDeserializedAOTMethod())
if (!compiler->getOption(TR_DisableCHOpts) && !useAotCompilation && (compiler->isDeserializedAOTMethodStore() || !compiler->isDeserializedAOTMethod()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble understanding this condition. Shouldn't isDeserializedAOTMethodStore() imply that this is an AOT compilation?

// TODO: is -Xaot:forceaot recognized when -Xshareclasses:none is specified?
canDoRelocatableCompile = true;
}
else if (!preferLocalComp(entry))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to factor this logic out (although to be fair, this whole method is a bit of a mess and could use some much heavier refactoring...).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:jitserver Artifacts related to JIT-as-a-Service project
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants