-
Notifications
You must be signed in to change notification settings - Fork 161
Open
Labels
good first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed
Description
Code of Conduct
- I agree to follow this project's Code of Conduct
Search before asking
- I have searched in the issues and found no similar issues.
Describe the bug
Affects Version(s)
master
Uniffle Server Log Output
jstack:
"Grpc-1788" #2073 daemon prio=5 os_prio=0 cpu=1723.11ms elapsed=88729.16s tid=0x00007f3d3c0f1000 nid=0x968 waiting for monitor entry [0x00007f3cf97fe000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.uniffle.server.ShuffleTaskManager.commitShuffle(ShuffleTaskManager.java:338)
- waiting to lock <0x00007f4fbf708e00> (a java.lang.Object)
at org.apache.uniffle.server.ShuffleServerGrpcService.finishShuffle(ShuffleServerGrpcService.java:468)
at org.apache.uniffle.proto.ShuffleServerGrpc$MethodHandlers.invoke(ShuffleServerGrpc.java:1060)
at io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182)
at io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
at io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:356)
at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:861)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
"Grpc-1359" #1629 daemon prio=5 os_prio=0 cpu=5536.44ms elapsed=88733.96s tid=0x00007f4380185800 nid=0x7ac waiting on condition [0x00007f41156fe000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.uniffle.server.ShuffleTaskManager.commitShuffle(ShuffleTaskManager.java:360)
- locked <0x00007f4fbf708e00> (a java.lang.Object)
at org.apache.uniffle.server.ShuffleServerGrpcService.finishShuffle(ShuffleServerGrpcService.java:468)
at org.apache.uniffle.proto.ShuffleServerGrpc$MethodHandlers.invoke(ShuffleServerGrpc.java:1060)
at io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182)
at io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
at io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:356)
at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:861)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
exception log:
[2024-07-03 08:54:32.973] [HadoopFlushEventThreadPool-1] [WARN] SingleStorageManager.write - Exception happened when write data for ShuffleDataFlushEvent: eventId=252896, appId=application_1716779728283_6825960_1719966578466, shuffleId=0, startPartition=315, endPartition=315, retryTimes=0, underStorage=HadoopStorage, isPended=false, ownedByHugePartition=false, try again
org.apache.uniffle.common.exception.RssException: java.io.IOException: All datanodes [DatanodeInfoWithStorage[127.0.0.1:9003,DS-3ad04d12-7d78-405f-ba33-d2bb706f073d,DISK]] are bad. Aborting...
at org.apache.uniffle.storage.handler.impl.HadoopShuffleWriteHandler.write(HadoopShuffleWriteHandler.java:157)
at org.apache.uniffle.storage.handler.impl.PooledHadoopShuffleWriteHandler.write(PooledHadoopShuffleWriteHandler.java:122)
at org.apache.uniffle.server.storage.SingleStorageManager.write(SingleStorageManager.java:59)
at org.apache.uniffle.server.storage.HybridStorageManager.write(HybridStorageManager.java:130)
at org.apache.uniffle.server.ShuffleFlushManager.processFlushEvent(ShuffleFlushManager.java:165)
at org.apache.uniffle.server.DefaultFlushEventHandler.handleEventAndUpdateMetrics(DefaultFlushEventHandler.java:97)
at org.apache.uniffle.server.DefaultFlushEventHandler.lambda$dispatchEvent$0(DefaultFlushEventHandler.java:219)
at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: All datanodes [DatanodeInfoWithStorage[127.0.0.1:9003,DS-3ad04d12-7d78-405f-ba33-d2bb706f073d,DISK]] are bad. Aborting...
at org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1567)
at org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1501)
at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1487)
at org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1262)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:673)
Uniffle Engine Log Output
No response
Uniffle Server Configurations
No response
Uniffle Engine Configurations
No response
Additional context
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed