Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue migrating directly from Hive Metastore to Glue Data Catalog #112

Open
vinceRicchiuti opened this issue Jan 27, 2022 · 0 comments
Open

Comments

@vinceRicchiuti
Copy link

vinceRicchiuti commented Jan 27, 2022

I am trying to migrate my Hive Metastore (rds) to my Glue Catalog.

I configure the job to run as spark job with all kind of matching

  • spark 2.4/3.1
  • python 2/3
  • Glue version 3.0/2.0/1.0/0.9

I followed readme to migrate directly from Hive Metastore to AWS Glue Data Catalog, but i experienced " 'str' object has no attribute '_jdf' "when i run the Glue ETL job. See the full error message below:

2022-01-27 16:53:53,940 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(73)): Error from Python:Traceback (most recent call last):
File "/tmp/import_into_datacatalog.py", line 130, in main()
File "/tmp/import_into_datacatalog.py", line 126, in main region=options.get('region') or 'us-east-1'
File "/tmp/import_into_datacatalog.py", line 51, in metastore_full_migration sc, sql_context, db_prefix, table_prefix).transform(hive_metastore)
File "/tmp/localPyFiles-0b1af0c4-b70f-4147-a11b-965a99faeb92/hive_metastore_migration.py", line 753, in transform ms_database_params=hive_metastore.ms_database_params)
File "/tmp/localPyFiles-0b1af0c4-b70f-4147-a11b-965a99faeb92/hive_metastore_migration.py", line 734, in transform_databases dbs_with_params = self.join_with_params(df=ms_dbs, df_params=ms_database_params, id_col='DB_ID')
File "/tmp/localPyFiles-0b1af0c4-b70f-4147-a11b-965a99faeb92/hive_metastore_migration.py", line 336, in join_with_params df_params_map = self.transform_params(params_df=df_params, id_col=id_col)
File "/tmp/localPyFiles-0b1af0c4-b70f-4147-a11b-965a99faeb92/hive_metastore_migration.py", line 314, in transform_params return self.kv_pair_to_map(params_df, id_col, key, value, 'parameters')
File "/tmp/localPyFiles-0b1af0c4-b70f-4147-a11b-965a99faeb92/hive_metastore_migration.py", line 326, in kv_pair_to_map id_type = df.get_schema_type(id_col)
File "/tmp/localPyFiles-0b1af0c4-b70f-4147-a11b-965a99faeb92/hive_metastore_migration.py", line 199, in get_schema_type return df.select(column_name).schema.fields[0].dataType
File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 1671, in select
jdf = self._jdf.select(self._jcols(*cols))AttributeError: 'str' object has no attribute '_jdf'

Actually i dunno how to manage this error. Could you give me some helps or suggestion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant