You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Full stack trace:
Traceback (most recent call last):
File "/home/hadoop/hive_metastore_migration.py", line 1525, in
main()
File "/home/hadoop/hive_metastore_migration.py", line 1519, in main
etl_from_metastore(sc, sql_context, db_prefix, table_prefix, hive_metastore, options)
etl_from_metastore(sc, sql_context, db_prefix, table_prefix, hive_metastore, options)
etl_from_metastore(sc, sql_context, db_prefix, table_prefix, hive_metastore, options)
File "/home/hadoop/hive_metastore_migration.py", line 1414, in etl_from_metastore
etl_from_metastore(sc, sql_context, db_prefix, table_prefix, hive_metastore, options)
File "/home/hadoop/hive_metastore_migration.py", line 1414, in etl_from_metastore
File "/home/hadoop/hive_metastore_migration.py", line 1414, in etl_from_metastore
.transform(hive_metastore)
.transform(hive_metastore)
.transform(hive_metastore)
File "/home/hadoop/hive_metastore_migration.py", line 753, in transform
ms_database_params=hive_metastore.ms_database_params)
File "/home/hadoop/hive_metastore_migration.py", line 734, in transform_databases
dbs_with_params = self.join_with_params(df=ms_dbs, df_params=ms_database_params, id_col='DB_ID')
File "/home/hadoop/hive_metastore_migration.py", line 336, in join_with_params
df_params_map = self.transform_params(params_df=df_params, id_col=id_col)
File "/home/hadoop/hive_metastore_migration.py", line 314, in transform_params
return self.kv_pair_to_map(params_df, id_col, key, value, 'parameters')
File "/home/hadoop/hive_metastore_migration.py", line 326, in kv_pair_to_map
id_type = df.get_schema_type(id_col)
File "/home/hadoop/hive_metastore_migration.py", line 199, in get_schema_type
return df.select(column_name).schema.fields[0].dataType
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 1327, in select
AttributeError: 'str' object has no attribute '_jdf'
I have also tried with EMR v6.5 with Spark v3.1.2. Same error. I thought it might be Spark version issue.
What Spark version has this script been successful with? EMR version?
I launch the spark-submit per the readme with the --jdbc* options changed as needed.
The text was updated successfully, but these errors were encountered:
Testing the HMS migration script with spark-submit command fails with:
AttributeError: 'str' object has no attribute '_jdf'
which is triggered by the call:
id_type = df.get_schema_type(id_col)
If I change the call to:
id_type = get_schema_type(df, id_col)
I get past the error but expose other df related errors in other functions.
This is tested on:
"emr-5.31.0"
"Hadoop":"2.10.0"
"Hive":"2.3.7"
"Spark":"2.4.6"
Full stack trace:
Traceback (most recent call last):
File "/home/hadoop/hive_metastore_migration.py", line 1525, in
main()
File "/home/hadoop/hive_metastore_migration.py", line 1519, in main
etl_from_metastore(sc, sql_context, db_prefix, table_prefix, hive_metastore, options)
etl_from_metastore(sc, sql_context, db_prefix, table_prefix, hive_metastore, options)
etl_from_metastore(sc, sql_context, db_prefix, table_prefix, hive_metastore, options)
File "/home/hadoop/hive_metastore_migration.py", line 1414, in etl_from_metastore
etl_from_metastore(sc, sql_context, db_prefix, table_prefix, hive_metastore, options)
File "/home/hadoop/hive_metastore_migration.py", line 1414, in etl_from_metastore
File "/home/hadoop/hive_metastore_migration.py", line 1414, in etl_from_metastore
.transform(hive_metastore)
.transform(hive_metastore)
.transform(hive_metastore)
File "/home/hadoop/hive_metastore_migration.py", line 753, in transform
ms_database_params=hive_metastore.ms_database_params)
File "/home/hadoop/hive_metastore_migration.py", line 734, in transform_databases
dbs_with_params = self.join_with_params(df=ms_dbs, df_params=ms_database_params, id_col='DB_ID')
File "/home/hadoop/hive_metastore_migration.py", line 336, in join_with_params
df_params_map = self.transform_params(params_df=df_params, id_col=id_col)
File "/home/hadoop/hive_metastore_migration.py", line 314, in transform_params
return self.kv_pair_to_map(params_df, id_col, key, value, 'parameters')
File "/home/hadoop/hive_metastore_migration.py", line 326, in kv_pair_to_map
id_type = df.get_schema_type(id_col)
File "/home/hadoop/hive_metastore_migration.py", line 199, in get_schema_type
return df.select(column_name).schema.fields[0].dataType
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 1327, in select
AttributeError: 'str' object has no attribute '_jdf'
I have also tried with EMR v6.5 with Spark v3.1.2. Same error. I thought it might be Spark version issue.
What Spark version has this script been successful with? EMR version?
I launch the spark-submit per the readme with the --jdbc* options changed as needed.
The text was updated successfully, but these errors were encountered: