Skip to content

File transfer sometimes doesn't work #1572

Open
@DasBabyPixel

Description

@DasBabyPixel

Stacktrace

Actions to reproduce

I have a setup where a minigame server downloads the Map to play on when the game starts.
This is done by calling TemplateStorage#openZipInputStreamAsync from the server main thread.
Sometimes the call never returns, causing the minigame server to be stuck in a waiting state.
As soon as that happens, all other requests also never get answered.
Because of that, as soon as it breaks once (usually once-twice a day), I have to restart the entire node to get the servers to work again.

This, to me, seems like an issue in the node.
I believe somehow the node tries to send the packet to the wrapper, but the packet never makes it all the way. The curious thing is, that any zip request packet from another server also never makes it back to the wrapper. There must be some sort of state in the node, that breaks.
Or some sort of deadlock, though I don't believe that. Maybe a race condition breaking a state? To me it seems that the packets should get sent to the wrapper (as per logging messages) so can this be a netty bug?
But that also seems unlikely, because other packets like ChannelMessages work fine.
I have no clue as to what exactly is happening here, those are just the things I thought of as of now.

I have done some debugging on my own to try and get more information about this issue. This issue is somewhat recent (I haven't always updated to the latest snapshots), but I don't think it is much older than 6 months, because until I updated the node everything worked fine.
My logging customizations are here.

The exact build I am running is here
My exact build has some more customizations, but mostly to module loading, nothing that should impact packet handling/file transfer except disabling zip compression

CloudNet version

[19.01 15:11:08.538] INFO :
[19.01 15:11:08.538] INFO : CloudNet Blizzard 4.0.0-RC12-SNAPSHOT f18671a
[19.01 15:11:08.538] INFO : Discord: https://discord.cloudnetservice.eu/
[19.01 15:11:08.538] INFO :
[19.01 15:11:08.538] INFO : ClusterId: ae0bbf39--431d--857e2580ae82
[19.01 15:11:08.538] INFO : NodeId: Node-1
[19.01 15:11:08.538] INFO : Head-NodeId: Node-1
[19.01 15:11:08.538] INFO : CPU usage: (P/S) .18/11.45/100%
[19.01 15:11:08.538] INFO : Node services memory allocation (U/R/M): 5532/5532/16384 MB
[19.01 15:11:08.538] INFO : Threads: 55
[19.01 15:11:08.538] INFO : Heap usage: 198/256MB
[19.01 15:11:08.538] INFO : JVM: Eclipse Adoptium 23 (OpenJDK 64-Bit Server VM 23.0.1+11)
[19.01 15:11:08.538] INFO : Update Repo: CloudNetService/launchermeta, Update Branch: beta (development mode)
[19.01 15:11:08.538] INFO :

Other

This is how the logging with my build should look like:

minigame server

[19.01 01:59:56.923] INFO : [9530b30c-e852-40c0-8935-1b59732308e9] Custom session registered: eu.cloudnetservice.driver.network.chunk.defaults.DefaultFileChunkedPacketHandler@1130df1c
[19.01 01:59:56.923] DEBUG: SendPacketSync BasePacket(channel=1, dataBuf=eu.cloudnetservice.driver.network.netty.buffer.NettyMutableDataBuf@81a9884, prioritized=false, creationStamp=2025-01-19T00:59:56.923751720Z, uniqueId=99ea61d9-ce73-4320-a94d-6b09e7c5085b)
[19.01 01:59:56.958] DEBUG: Received packet multithreadEventLoopGroup-1-1: BasePacket(channel=-1, dataBuf=eu.cloudnetservice.driver.network.netty.buffer.NettyImmutableDataBuf@ecfba07, prioritized=false, creationStamp=2025-01-19T00:59:56.958636121Z, uniqueId=99ea61d9-ce73-4320-a94d-6b09e7c5085b)
[19.01 01:59:57.087] DEBUG: Received packet multithreadEventLoopGroup-1-1: BasePacket(channel=2, dataBuf=eu.cloudnetservice.driver.network.netty.buffer.NettyImmutableDataBuf@202ef912, prioritized=false, creationStamp=2025-01-19T00:59:57.087004118Z, uniqueId=null)
[19.01 01:59:57.087] DEBUG: HandlePacket Packet-Dispatcher-1 NetworkClientChannelHandler: BasePacket(channel=2, dataBuf=eu.cloudnetservice.driver.network.netty.buffer.NettyImmutableDataBuf@202ef912, prioritized=false, creationStamp=2025-01-19T00:59:57.087004118Z, uniqueId=null)
[19.01 01:59:57.087] INFO : Receive on packet-com channel
[19.01 01:59:57.109] INFO : [9530b30c-e852-40c0-8935-1b59732308e9] Chunk received: ChunkSessionInformation[chunkSize=52428800, sessionUniqueId=9530b30c-e852-40c0-8935-1b59732308e9, transferChannel=query:dummy, transferInformation=eu.cloudnetservice.driver.network.netty.buffer.NettyImmutableDataBuf@7e21850] - Complete: true - Handler: eu.cloudnetservice.driver.network.chunk.defaults.DefaultFileChunkedPacketHandler@1130df1c
[19.01 01:59:57.111] INFO : [9530b30c-e852-40c0-8935-1b59732308e9] Session complete
[19.01 01:59:57.112] INFO : Receive on packet-com channel

node

[19.01 01:59:56.923] DEBUG: Received packet multithreadEventLoopGroup-4-4: BasePacket(channel=1, dataBuf=eu.cloudnetservice.driver.network.netty.buffer.NettyImmutableDataBuf@3efdf11c, prioritized=false, creationStamp=2025-01-19T00:59:56.923950911Z, uniqueId=99ea61d9-ce73-4320-a94d-6b09e7c5085b)
[19.01 01:59:56.924] DEBUG: HandlePacket Packet-Dispatcher-3 DefaultNetworkServerChannelHandler: BasePacket(channel=1, dataBuf=eu.cloudnetservice.driver.network.netty.buffer.NettyImmutableDataBuf@3efdf11c, prioritized=false, creationStamp=2025-01-19T00:59:56.923950911Z, uniqueId=99ea61d9-ce73-4320-a94d-6b09e7c5085b)
[19.01 01:59:56.924] INFO : [9530b30c-e852-40c0-8935-1b59732308e9] File request received: remote_templates_zip_template
[19.01 01:59:56.924] DEBUG: Calling event eu.cloudnetservice.driver.network.chunk.event.FileQueryRequestEvent on listener eu.cloudnetservice.node.network.chunk.FileDeployCallbackListener
[19.01 01:59:56.958] INFO : [9530b30c-e852-40c0-8935-1b59732308e9] Responding to file request 10
[19.01 01:59:56.958] DEBUG: SendPacket BasePacket(channel=-1, dataBuf=eu.cloudnetservice.driver.network.netty.buffer.NettyMutableDataBuf@203cea6a, prioritized=false, creationStamp=2025-01-19T00:59:56.958483776Z, uniqueId=99ea61d9-ce73-4320-a94d-6b09e7c5085b)
[19.01 01:59:57.078] DEBUG: SendPacketSync BasePacket(channel=2, dataBuf=eu.cloudnetservice.driver.network.netty.buffer.NettyMutableDataBuf@3ecf4574, prioritized=false, creationStamp=2025-01-19T00:59:57.078346121Z, uniqueId=null)
[19.01 01:59:57.080] INFO : [9530b30c-e852-40c0-8935-1b59732308e9] Sending last packet 0

how the log looks like when it breaks

minigame server

...
[19.01 14:53:05.534] INFO : [c312112f-c124-43b4-82d1-a96a1c167ce6] Custom session registered: eu.cloudnetservice.driver.network.chunk.defaults.DefaultFileChunkedPacketHandler@430ab449
[19.01 14:53:05.534] DEBUG: SendPacketSync BasePacket(channel=1, dataBuf=eu.cloudnetservice.driver.network.netty.buffer.NettyMutableDataBuf@612537eb, prioritized=false, creationStamp=2025-01-19T13:53:05.534742362Z, uniqueId=3b88d869-9375-4def-b4ea-ec068b16ac53)
[19.01 14:53:05.555] DEBUG: Received packet multithreadEventLoopGroup-1-1: BasePacket(channel=-1, dataBuf=eu.cloudnetservice.driver.network.netty.buffer.NettyImmutableDataBuf@7e141535, prioritized=false, creationStamp=2025-01-19T13:53:05.555921815Z, uniqueId=3b88d869-9375-4def-b4ea-ec068b16ac53)
[19.01 14:53:06.286] DEBUG: execute query: SELECT `id`, `msg` FROM `luckperms_messenger` WHERE `id` > ? AND (NOW() - `time` < 30)
...

node

[19.01 14:53:05.534] DEBUG: Received packet multithreadEventLoopGroup-4-2: BasePacket(channel=1, dataBuf=eu.cloudnetservice.driver.network.netty.buffer.NettyImmutableDataBuf@17930c33, prioritized=false, creationStamp=2025-01-19T13:53:05.534921786Z, uniqueId=3b88d869-9375-4def-b4ea-ec068b16ac53)
[19.01 14:53:05.534] DEBUG: HandlePacket Packet-Dispatcher-2 DefaultNetworkServerChannelHandler: BasePacket(channel=1, dataBuf=eu.cloudnetservice.driver.network.netty.buffer.NettyImmutableDataBuf@17930c33, prioritized=false, creationStamp=2025-01-19T13:53:05.534921786Z, uniqueId=3b88d869-9375-4def-b4ea-ec068b16ac53)
[19.01 14:53:05.535] INFO : [c312112f-c124-43b4-82d1-a96a1c167ce6] File request received: remote_templates_zip_template
[19.01 14:53:05.535] DEBUG: Calling event eu.cloudnetservice.driver.network.chunk.event.FileQueryRequestEvent on listener eu.cloudnetservice.node.network.chunk.FileDeployCallbackListener
[19.01 14:53:05.553] INFO : [c312112f-c124-43b4-82d1-a96a1c167ce6] Responding to file request 8
[19.01 14:53:05.555] DEBUG: SendPacket BasePacket(channel=-1, dataBuf=eu.cloudnetservice.driver.network.netty.buffer.NettyMutableDataBuf@58a0c028, prioritized=false, creationStamp=2025-01-19T13:53:05.555636121Z, uniqueId=3b88d869-9375-4def-b4ea-ec068b16ac53)
[19.01 14:53:05.666] DEBUG: SendPacketSync BasePacket(channel=2, dataBuf=eu.cloudnetservice.driver.network.netty.buffer.NettyMutableDataBuf@7d488351, prioritized=false, creationStamp=2025-01-19T13:53:05.666178521Z, uniqueId=null)
[19.01 14:53:05.939] DEBUG: Calling event eu.cloudnetservice.node.event.instance.CloudNetTickServiceStartEvent on listener eu.darkcube.minigame.woolbattle.module.ServiceListener
[19.01 14:53:06.225] INFO : [c312112f-c124-43b4-82d1-a96a1c167ce6] Sending last packet 0

I filtered the node logs and removed the listener debug messages, otherwise the logs would be very cluttered.
The minigame server logs are from the .wrapper/logs directory

I have also created heap dumps of the problematic servers and saved the logs, should any questions arise here.

Issue uniqueness

  • Yes, this issue is unique. There are no similar issues.

EDIT1

Trying to create thousands of zip requests doesn't hasn't broken the thing yet for me. So this is difficult to reproduce.
The 1-2 breakages/day happen from ~15 requests spread over the day

EDIT2

I uploaded the relevant log files
working minigame
working node
broken minigame
broken node

Metadata

Metadata

Assignees

No one assigned

    Labels

    s: needs triageIssue waiting for triaget: bugSomething isn't working as intended

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions