-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
Describe the bug
- This bug was encountered while using GEDS-HDFS as a tier-2 storage for Pravega.
- While configured to spill to MInIO (S3), GEDS spills earlier than expected in the event of an MInIO outage. with the working directory set to a drive with 20GB of storage space the expected behaviour is that in the event of an MInIO outage, GEDS should fill up to ~70% of its capacity (~14GB), before throttling and errors are encountered.
- In reality, only ~2.4GB is written to GEDS before throttling occurs.
- In the logs, cURL error 7 (could not connect) and 28 (Timeout reached) are shown repeatedly. In particular, the first instance of error 28 aligns with when the throttling begins.
- I believe GEDS may be able to last significantly longer while under an MInIO outage, and this is being hindered by some sort of timeout.
To Reproduce
- Follow the instructions 1.) and 2.) at https://github.com/cloudskin-eu/pravega-geds to achieve the Pravega-GEDS deployment.
- Run
/setup-scripts/pravega-geds-install.sh
to install the GEDS-integrated Pravega deployment on Kubernetes. - Navigate to
/experiment
and runrun-experiment.sh
. - Logs for the Pravega segment-store pod can be viewed through
kubectl logs pravega-pravega-segmentstore-0
. The error(s) should be visible in the logs.
Additional information
Configuration Used:
GEDS is configured using environment variables:
options:
pravegaservice.storage.layout: "CHUNKED_STORAGE"
pravegaservice.storage.impl.name: "HDFS"
hdfs.connect.uri: "hdfs://tier-2-geds"
hdfs.fs.impl: "com.ibm.geds.hdfs.GEDSHadoopFileSystem"
env:
GEDS_METADATASERVER: "geds-metadataserver:4381"
GEDS_LOCAL_STORAGE_PATH: "/tmp/pravega/cache"
AWS_ACCESS_KEY_ID: "miniostorage"
AWS_SECRET_ACCESS_KEY: "miniostorage"
AWS_ENDPOINT_URL: "http://minio.pravega.svc.cluster.local:80"
GEDS_CONFIGURE_S3_USING_ENV: "1"
Curl Code 7
java.util.concurrent.CompletionException: io.pravega.segmentstore.storage.chunklayer.ChunkStorageException: checkExists
at io.pravega.segmentstore.storage.chunklayer.AsyncBaseChunkStorage.lambda$execute$13(AsyncBaseChunkStorage.java:751)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at io.pravega.common.concurrent.ThreadPoolScheduledExecutorService$ScheduledRunnable.run(ThreadPoolScheduledExecutorService.java:209)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: io.pravega.segmentstore.storage.chunklayer.ChunkStorageException: checkExists
at io.pravega.storage.hdfs.HDFSChunkStorage.convertException(HDFSChunkStorage.java:367)
at io.pravega.storage.hdfs.HDFSChunkStorage.checkExists(HDFSChunkStorage.java:169)
at io.pravega.segmentstore.storage.chunklayer.BaseChunkStorage.lambda$checkExistsAsync$3(BaseChunkStorage.java:89)
at io.pravega.segmentstore.storage.chunklayer.AsyncBaseChunkStorage.lambda$execute$13(AsyncBaseChunkStorage.java:747)
... 6 common frames omitted
Caused by: java.io.IOException: Unable to file status: _system/containers/_sysjournal.container4.snapshot_info: curlCode: 7, Couldn't connect to server
at com.ibm.geds.GEDS.nativeStatus(Native Method)
at com.ibm.geds.GEDS.status(GEDS.java:260)
at com.ibm.geds.hdfs.GEDSHadoopFileSystem.getFileStatus(GEDSHadoopFileSystem.java:154)
at io.pravega.storage.hdfs.HDFSChunkStorage.checkExists(HDFSChunkStorage.java:164)
... 8 common frames omitted
_system/containers/_sysjournal.container4.snapshot_info
Curl Code 28
at io.pravega.segmentstore.storage.chunklayer.AsyncBaseChunkStorage.lambda$execute$13(AsyncBaseChunkStorage.java:751)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at io.pravega.common.concurrent.ThreadPoolScheduledExecutorService$ScheduledRunnable.run(ThreadPoolScheduledExecutorService.java:209)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: io.pravega.segmentstore.storage.chunklayer.ChunkStorageException: checkExists
at io.pravega.storage.hdfs.HDFSChunkStorage.convertException(HDFSChunkStorage.java:367)
at io.pravega.storage.hdfs.HDFSChunkStorage.checkExists(HDFSChunkStorage.java:169)
at io.pravega.segmentstore.storage.chunklayer.BaseChunkStorage.lambda$checkExistsAsync$3(BaseChunkStorage.java:89)
at io.pravega.segmentstore.storage.chunklayer.AsyncBaseChunkStorage.lambda$execute$13(AsyncBaseChunkStorage.java:747)
... 6 common frames omitted
Caused by: java.io.IOException: Unable to file status: _system/containers/_sysjournal.container7.snapshot_info: curlCode: 28, Timeout was reached
at com.ibm.geds.GEDS.nativeStatus(Native Method)
at com.ibm.geds.GEDS.status(GEDS.java:260)
at com.ibm.geds.hdfs.GEDSHadoopFileSystem.getFileStatus(GEDSHadoopFileSystem.java:154)
at io.pravega.storage.hdfs.HDFSChunkStorage.checkExists(HDFSChunkStorage.java:164)
... 8 common frames omitted
_system/containers/_sysjournal.container7.snapshot_info```
Metadata
Metadata
Assignees
Labels
No labels