hive_metastore_migration.py fails with AttributeError: 'str' object has no attribute '_jdf' #120

jobennin · 2022-04-14T18:25:57Z

Testing the HMS migration script with spark-submit command fails with:
AttributeError: 'str' object has no attribute '_jdf'

which is triggered by the call:
id_type = df.get_schema_type(id_col)

If I change the call to:
id_type = get_schema_type(df, id_col)
I get past the error but expose other df related errors in other functions.

This is tested on:

"emr-5.31.0"
"Hadoop":"2.10.0"
"Hive":"2.3.7"
"Spark":"2.4.6"

Full stack trace:
Traceback (most recent call last):
File "/home/hadoop/hive_metastore_migration.py", line 1525, in
main()
File "/home/hadoop/hive_metastore_migration.py", line 1519, in main
etl_from_metastore(sc, sql_context, db_prefix, table_prefix, hive_metastore, options)
etl_from_metastore(sc, sql_context, db_prefix, table_prefix, hive_metastore, options)
etl_from_metastore(sc, sql_context, db_prefix, table_prefix, hive_metastore, options)
File "/home/hadoop/hive_metastore_migration.py", line 1414, in etl_from_metastore
etl_from_metastore(sc, sql_context, db_prefix, table_prefix, hive_metastore, options)
File "/home/hadoop/hive_metastore_migration.py", line 1414, in etl_from_metastore
File "/home/hadoop/hive_metastore_migration.py", line 1414, in etl_from_metastore
.transform(hive_metastore)
.transform(hive_metastore)
.transform(hive_metastore)
File "/home/hadoop/hive_metastore_migration.py", line 753, in transform
ms_database_params=hive_metastore.ms_database_params)
File "/home/hadoop/hive_metastore_migration.py", line 734, in transform_databases
dbs_with_params = self.join_with_params(df=ms_dbs, df_params=ms_database_params, id_col='DB_ID')
File "/home/hadoop/hive_metastore_migration.py", line 336, in join_with_params
df_params_map = self.transform_params(params_df=df_params, id_col=id_col)
File "/home/hadoop/hive_metastore_migration.py", line 314, in transform_params
return self.kv_pair_to_map(params_df, id_col, key, value, 'parameters')
File "/home/hadoop/hive_metastore_migration.py", line 326, in kv_pair_to_map
id_type = df.get_schema_type(id_col)
File "/home/hadoop/hive_metastore_migration.py", line 199, in get_schema_type
return df.select(column_name).schema.fields[0].dataType
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 1327, in select
AttributeError: 'str' object has no attribute '_jdf'

I have also tried with EMR v6.5 with Spark v3.1.2. Same error. I thought it might be Spark version issue.
What Spark version has this script been successful with? EMR version?
I launch the spark-submit per the readme with the --jdbc* options changed as needed.

Dearkano · 2022-05-11T00:54:25Z

same issue here

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hive_metastore_migration.py fails with AttributeError: 'str' object has no attribute '_jdf' #120

hive_metastore_migration.py fails with AttributeError: 'str' object has no attribute '_jdf' #120

jobennin commented Apr 14, 2022

Dearkano commented May 11, 2022

hive_metastore_migration.py fails with AttributeError: 'str' object has no attribute '_jdf' #120

hive_metastore_migration.py fails with AttributeError: 'str' object has no attribute '_jdf' #120

Comments

jobennin commented Apr 14, 2022

Dearkano commented May 11, 2022