Description
Looks like there is a CUDNN lib mismatch in the docker setup.
2022-12-16 16:11:58.828108: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:421] Loaded runtime CuDNN library: 8.4.0 but source was compiled with: 8.6.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration. 2022-12-16 16:11:58.830602: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:421] Loaded runtime CuDNN library: 8.4.0 but source was compiled with: 8.6.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
Results in a lib not founnd
`XlaRuntimeError Traceback (most recent call last)
Cell In[14], line 25
23 encoded_images = encoded_images.sequences[..., 1:]
24 # decode images
---> 25 decoded_images = p_decode(encoded_images, vqgan_params)
26 decoded_images = decoded_images.clip(0.0, 1.0).reshape((-1, 256, 256, 3))
27 for decoded_img in decoded_images:
[... skipping hidden 11 frame]
File /usr/local/lib/python3.8/dist-packages/jax/_src/dispatch.py:1014, in backend_compile(backend, built_c, options, host_callbacks)
1009 return backend.compile(built_c, compile_options=options,
1010 host_callbacks=host_callbacks)
1011 # Some backends don't have host_callbacks
option yet
1012 # TODO(sharadmv): remove this fallback when all backends allow compile
1013 # to take in host_callbacks
-> 1014 return backend.compile(built_c, compile_options=options)
XlaRuntimeError: UNKNOWN: Failed to determine best cudnn convolution algorithm for:
%cudnn-conv-bias-activation.219 = (f32[2,256,16,16]{3,2,1,0}, u8[0]{0}) custom-call(f32[2,256,16,16]{3,2,1,0} %bitcast.256, f32[256,256,1,1]{3,2,1,0} %bitcast.263, f32[256]{0} %get-tuple-element.341), window={size=1x1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_name="pmap(p_decode)/jit(main)/VQModule.decode_code/VQModule.decode/post_quant_conv/conv_general_dilated[window_strides=(1, 1) padding=((0, 0), (0, 0)) lhs_dilation=(1, 1) rhs_dilation=(1, 1) dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2)) feature_group_count=1 batch_group_count=1 lhs_shape=(2, 16, 16, 256) rhs_shape=(1, 1, 256, 256) precision=None preferred_element_type=None]" source_file="/usr/local/lib/python3.8/dist-packages/flax/linen/linear.py" source_line=438}, backend_config="{"conv_result_scale":1,"activation_mode":"0","side_input_scale":0}"
Original error: UNIMPLEMENTED: DNN library is not found.
To ignore this failure and try to use a fallback algorithm (which may have suboptimal performance), use XLA_FLAGS=--xla_gpu_strict_conv_algorithm_picker=false. Please also file a bug for the root cause of failing autotuning.`