You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to migrate my Hive Metastore (rds) to my Glue Catalog.
I configure the job to run as spark job with all kind of matching
spark 2.4/3.1
python 2/3
Glue version 3.0/2.0/1.0/0.9
I followed readme to migrate directly from Hive Metastore to AWS Glue Data Catalog, but i experienced " 'str' object has no attribute '_jdf' "when i run the Glue ETL job. See the full error message below:
2022-01-27 16:53:53,940 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(73)): Error from Python:Traceback (most recent call last):
File "/tmp/import_into_datacatalog.py", line 130, in main()
File "/tmp/import_into_datacatalog.py", line 126, in main region=options.get('region') or 'us-east-1'
File "/tmp/import_into_datacatalog.py", line 51, in metastore_full_migration sc, sql_context, db_prefix, table_prefix).transform(hive_metastore)
File "/tmp/localPyFiles-0b1af0c4-b70f-4147-a11b-965a99faeb92/hive_metastore_migration.py", line 753, in transform ms_database_params=hive_metastore.ms_database_params)
File "/tmp/localPyFiles-0b1af0c4-b70f-4147-a11b-965a99faeb92/hive_metastore_migration.py", line 734, in transform_databases dbs_with_params = self.join_with_params(df=ms_dbs, df_params=ms_database_params, id_col='DB_ID')
File "/tmp/localPyFiles-0b1af0c4-b70f-4147-a11b-965a99faeb92/hive_metastore_migration.py", line 336, in join_with_params df_params_map = self.transform_params(params_df=df_params, id_col=id_col)
File "/tmp/localPyFiles-0b1af0c4-b70f-4147-a11b-965a99faeb92/hive_metastore_migration.py", line 314, in transform_params return self.kv_pair_to_map(params_df, id_col, key, value, 'parameters')
File "/tmp/localPyFiles-0b1af0c4-b70f-4147-a11b-965a99faeb92/hive_metastore_migration.py", line 326, in kv_pair_to_map id_type = df.get_schema_type(id_col)
File "/tmp/localPyFiles-0b1af0c4-b70f-4147-a11b-965a99faeb92/hive_metastore_migration.py", line 199, in get_schema_type return df.select(column_name).schema.fields[0].dataType
File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 1671, in select
jdf = self._jdf.select(self._jcols(*cols))AttributeError: 'str' object has no attribute '_jdf'
Actually i dunno how to manage this error. Could you give me some helps or suggestion?
The text was updated successfully, but these errors were encountered:
I am trying to migrate my Hive Metastore (rds) to my Glue Catalog.
I configure the job to run as spark job with all kind of matching
I followed readme to migrate directly from Hive Metastore to AWS Glue Data Catalog, but i experienced " 'str' object has no attribute '_jdf' "when i run the Glue ETL job. See the full error message below:
2022-01-27 16:53:53,940 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(73)): Error from Python:Traceback (most recent call last):
File "/tmp/import_into_datacatalog.py", line 130, in main()
File "/tmp/import_into_datacatalog.py", line 126, in main region=options.get('region') or 'us-east-1'
File "/tmp/import_into_datacatalog.py", line 51, in metastore_full_migration sc, sql_context, db_prefix, table_prefix).transform(hive_metastore)
File "/tmp/localPyFiles-0b1af0c4-b70f-4147-a11b-965a99faeb92/hive_metastore_migration.py", line 753, in transform ms_database_params=hive_metastore.ms_database_params)
File "/tmp/localPyFiles-0b1af0c4-b70f-4147-a11b-965a99faeb92/hive_metastore_migration.py", line 734, in transform_databases dbs_with_params = self.join_with_params(df=ms_dbs, df_params=ms_database_params, id_col='DB_ID')
File "/tmp/localPyFiles-0b1af0c4-b70f-4147-a11b-965a99faeb92/hive_metastore_migration.py", line 336, in join_with_params df_params_map = self.transform_params(params_df=df_params, id_col=id_col)
File "/tmp/localPyFiles-0b1af0c4-b70f-4147-a11b-965a99faeb92/hive_metastore_migration.py", line 314, in transform_params return self.kv_pair_to_map(params_df, id_col, key, value, 'parameters')
File "/tmp/localPyFiles-0b1af0c4-b70f-4147-a11b-965a99faeb92/hive_metastore_migration.py", line 326, in kv_pair_to_map id_type = df.get_schema_type(id_col)
File "/tmp/localPyFiles-0b1af0c4-b70f-4147-a11b-965a99faeb92/hive_metastore_migration.py", line 199, in get_schema_type return df.select(column_name).schema.fields[0].dataType
File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 1671, in select
jdf = self._jdf.select(self._jcols(*cols))AttributeError: 'str' object has no attribute '_jdf'
Actually i dunno how to manage this error. Could you give me some helps or suggestion?
The text was updated successfully, but these errors were encountered: