-
Notifications
You must be signed in to change notification settings - Fork 132
Open
Labels
Description
Describe the bug
I am trying to use stage_external_sources to run to materialize s3 location as external table before using Spark 3.5.0 in emr-7.0.0
First query runs fine show table extended in external_tables like '*'
Second query runs fine but doesn't give any output and dbt run-operation errors out create schema if not exists external_tables
6:38:29 Running with dbt=1.8.9
16:38:29 Registered adapter: spark=1.8.0
16:38:29 Found 76 models, 23 data tests, 52 sources, 2 exposures, 247 metrics, 700 macros, 16 semantic models
16:38:29 1 of 1 START external source external_tables.tommy
16:38:32 1 of 1 (1) create schema if not exists external_tables
16:38:34 Encountered an error while running operation: Compilation Error
'NoneType' object is not iterable
> in macro stage_external_sources (macros/common/stage_external_sources.sql)
> called by <Unknown>
Steps to reproduce
- Add sources.yml
version: 2
sources:
- name: external_tables
description: Contains data in s3
tables:
- name: tommy
external:
location: s3://xxxx-xxx/xxx
using: parquet
columns:
- name: weekend_day
data_type: date
- name: region_id
data_type: smallint
- Run
dbt run-operation stage_external_sources --args "select: external_tables.tommy" --profile spark
Expected results
Expect Operation to complete
Actual results
'NoneType' object is not iterable
Screenshots and log output
�[0m22:08:29.249944 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1075f6090>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x107633a10>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1076200d0>]}
============================== 22:08:29.252813 | 0c457ee0-660f-4a43-85a0-7947603a8375 ==============================
�[0m22:08:29.252813 [info ] [MainThread]: Running with dbt=1.8.9
�[0m22:08:29.253090 [debug] [MainThread]: running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'log_cache_events': 'False', 'write_json': 'True', 'partial_parse': 'True', 'cache_selected_only': 'False', 'warn_error': 'None', 'debug': 'False', 'version_check': 'True', 'log_path': 'logs', 'fail_fast': 'False', 'profiles_dir': 'HerculesAmex', 'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'False', 'empty': 'None', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'static_parser': 'True', 'log_format': 'default', 'introspect': 'True', 'target_path': 'None', 'invocation_command': 'dbt run-operation stage_external_sources --args select: external_tables.tommy --profile spark', 'send_anonymous_usage_stats': 'True'}
�[0m22:08:29.355830 [debug] [MainThread]: Spark adapter: Setting pyhive.hive logging to ERROR
�[0m22:08:29.356097 [debug] [MainThread]: Spark adapter: Setting thrift.transport logging to ERROR
�[0m22:08:29.356243 [debug] [MainThread]: Spark adapter: Setting thrift.protocol logging to ERROR
�[0m22:08:29.418957 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'project_id', 'label': '0c457ee0-660f-4a43-85a0-7947603a8375', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1075e1ed0>]}
�[0m22:08:29.440477 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': '0c457ee0-660f-4a43-85a0-7947603a8375', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1207ae510>]}
�[0m22:08:29.440878 [info ] [MainThread]: Registered adapter: spark=1.8.0
�[0m22:08:29.504317 [debug] [MainThread]: checksum: 85b61b60f3abb79f042182ca285d45490368bec2b59770b399e796a03cafaa67, vars: {}, profile: spark, target: , version: 1.8.9
�[0m22:08:29.653291 [debug] [MainThread]: Partial parsing enabled: 0 files deleted, 0 files added, 0 files changed.
�[0m22:08:29.653554 [debug] [MainThread]: Partial parsing enabled, no changes found, skipping parsing
�[0m22:08:29.707335 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'load_project', 'label': '0c457ee0-660f-4a43-85a0-7947603a8375', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x121961d10>]}
�[0m22:08:29.838657 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label': '0c457ee0-660f-4a43-85a0-7947603a8375', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1075e9b10>]}
�[0m22:08:29.838955 [info ] [MainThread]: Found 76 models, 23 data tests, 52 sources, 2 exposures, 247 metrics, 700 macros, 16 semantic models
�[0m22:08:29.839114 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': '0c457ee0-660f-4a43-85a0-7947603a8375', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x121357350>]}
�[0m22:08:29.839354 [debug] [MainThread]: Acquiring new spark connection 'macro_stage_external_sources'
�[0m22:08:29.839488 [debug] [MainThread]: Spark adapter: NotImplemented: add_begin_query
�[0m22:08:29.839583 [debug] [MainThread]: Spark adapter: NotImplemented: commit
�[0m22:08:29.846846 [info ] [MainThread]: 1 of 1 START external source external_tables.tommy
�[0m22:08:29.849240 [debug] [MainThread]: On "macro_stage_external_sources": cache miss for schema ".external_tables", this is inefficient
�[0m22:08:29.853775 [debug] [MainThread]: Using spark connection "macro_stage_external_sources"
�[0m22:08:29.853990 [debug] [MainThread]: On macro_stage_external_sources: /* {"app": "dbt", "dbt_version": "1.8.9", "profile_name": "spark", "target_name": "dev", "connection_name": "macro_stage_external_sources"} */
show table extended in external_tables like '*'
�[0m22:08:29.854107 [debug] [MainThread]: Opening a new connection, currently in state init
�[0m22:08:31.888206 [debug] [MainThread]: Spark adapter: Poll status: 2, query complete
�[0m22:08:31.889978 [debug] [MainThread]: SQL status: OK in 2.036 seconds
�[0m22:08:32.378575 [debug] [MainThread]: While listing relations in database=, schema=external_tables, found:
�[0m22:08:32.402418 [info ] [MainThread]: 1 of 1 (1) create schema if not exists external_tables
�[0m22:08:32.404740 [debug] [MainThread]: Using spark connection "macro_stage_external_sources"
�[0m22:08:32.405017 [debug] [MainThread]: On macro_stage_external_sources: /* {"app": "dbt", "dbt_version": "1.8.9", "profile_name": "spark", "target_name": "dev", "connection_name": "macro_stage_external_sources"} */
create schema if not exists external_tables
�[0m22:08:32.868860 [debug] [MainThread]: Spark adapter: Poll status: 2, query complete
�[0m22:08:32.869427 [debug] [MainThread]: SQL status: OK in 0.464 seconds
�[0m22:08:33.350666 [debug] [MainThread]: Spark adapter: Error while running:
macro stage_external_sources
�[0m22:08:33.351951 [debug] [MainThread]: Spark adapter: Compilation Error
'NoneType' object is not iterable
> in macro stage_external_sources (macros/common/stage_external_sources.sql)
> called by <Unknown>
�[0m22:08:33.352762 [debug] [MainThread]: On macro_stage_external_sources: ROLLBACK
�[0m22:08:33.353346 [debug] [MainThread]: Spark adapter: NotImplemented: rollback
�[0m22:08:33.353929 [debug] [MainThread]: On macro_stage_external_sources: Close
�[0m22:08:34.027395 [error] [MainThread]: Encountered an error while running operation: Compilation Error
'NoneType' object is not iterable
> in macro stage_external_sources (macros/common/stage_external_sources.sql)
> called by <Unknown>
�[0m22:08:34.032499 [debug] [MainThread]: Traceback (most recent call last):
File ".venv/amex/lib/python3.11/site-packages/dbt_common/clients/jinja.py", line 346, in exception_handler
yield
File ".venv/amex/lib/python3.11/site-packages/dbt_common/clients/jinja.py", line 323, in call_macro
return macro(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File ".venv/amex/lib/python3.11/site-packages/jinja2/runtime.py", line 770, in __call__
return self._invoke(arguments, autoescape)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/amex/lib/python3.11/site-packages/jinja2/runtime.py", line 784, in _invoke
rv = self._func(*arguments)
^^^^^^^^^^^^^^^^^^^^^^
File "<template>", line 52, in macro
File ".venv/amex/lib/python3.11/site-packages/jinja2/sandbox.py", line 401, in call
return __context.call(__obj, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/amex/lib/python3.11/site-packages/jinja2/runtime.py", line 303, in call
return __obj(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File ".venv/amex/lib/python3.11/site-packages/dbt/adapters/base/impl.py", line 399, in execute
return self.connections.execute(sql=sql, auto_begin=auto_begin, fetch=fetch, limit=limit)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/amex/lib/python3.11/site-packages/dbt/adapters/sql/connections.py", line 221, in execute
table = self.get_result_from_cursor(cursor, limit)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/amex/lib/python3.11/site-packages/dbt/adapters/sql/connections.py", line 203, in get_result_from_cursor
rows = cursor.fetchall()
^^^^^^^^^^^^^^^^^
File ".venv/amex/lib/python3.11/site-packages/dbt/adapters/spark/connections.py", line 251, in fetchall
return self._cursor.fetchall()
^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/amex/lib/python3.11/site-packages/pyhive/common.py", line 142, in fetchall
return list(iter(self.fetchone, None))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/amex/lib/python3.11/site-packages/pyhive/common.py", line 111, in fetchone
self._fetch_while(lambda: not self._data and self._state != self._STATE_FINISHED)
File ".venv/amex/lib/python3.11/site-packages/pyhive/common.py", line 51, in _fetch_while
self._fetch_more()
File ".venv/amex/lib/python3.11/site-packages/pyhive/hive.py", line 507, in _fetch_more
zip(response.results.columns, schema)]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'NoneType' object is not iterable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File ".venv/amex/lib/python3.11/site-packages/dbt/task/run_operation.py", line 62, in run
self._run_unsafe(package_name, macro_name)
File ".venv/amex/lib/python3.11/site-packages/dbt/task/run_operation.py", line 47, in _run_unsafe
res = adapter.execute_macro(
^^^^^^^^^^^^^^^^^^^^^^
File ".venv/amex/lib/python3.11/site-packages/dbt/adapters/base/impl.py", line 1193, in execute_macro
result = macro_function(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/amex/lib/python3.11/site-packages/dbt_common/clients/jinja.py", line 355, in __call__
return self.call_macro(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/amex/lib/python3.11/site-packages/dbt_common/clients/jinja.py", line 323, in call_macro
return macro(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File ".venv/amex/lib/python3.11/site-packages/jinja2/runtime.py", line 770, in __call__
return self._invoke(arguments, autoescape)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/amex/lib/python3.11/site-packages/jinja2/runtime.py", line 784, in _invoke
rv = self._func(*arguments)
^^^^^^^^^^^^^^^^^^^^^^
File "<template>", line 220, in macro
File ".venv/amex/lib/python3.11/site-packages/jinja2/sandbox.py", line 401, in call
return __context.call(__obj, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/amex/lib/python3.11/site-packages/jinja2/runtime.py", line 303, in call
return __obj(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File ".venv/amex/lib/python3.11/site-packages/dbt/clients/jinja.py", line 84, in __call__
return self.call_macro(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/amex/lib/python3.11/site-packages/dbt_common/clients/jinja.py", line 321, in call_macro
with self.exception_handler():
File ".local/share/mise/installs/python/3.11.11/lib/python3.11/contextlib.py", line 158, in __exit__
self.gen.throw(typ, value, traceback)
File ".venv/amex/lib/python3.11/site-packages/dbt_common/clients/jinja.py", line 348, in exception_handler
raise CaughtMacroErrorWithNodeError(exc=e, node=self.macro)
dbt_common.exceptions.macros.CaughtMacroErrorWithNodeError: Compilation Error
'NoneType' object is not iterable
> in macro stage_external_sources (macros/common/stage_external_sources.sql)
> called by <Unknown>
�[0m22:08:34.037470 [debug] [MainThread]: Resource report: {"command_name": "run-operation", "command_success": false, "command_wall_clock_time": 4.824697, "process_in_blocks": "0", "process_kernel_time": 0.261603, "process_mem_max_rss": "134938624", "process_out_blocks": "0", "process_user_time": 0.939263}
�[0m22:08:34.037897 [debug] [MainThread]: Command `dbt run-operation` failed at 22:08:34.037835 after 4.83 seconds
�[0m22:08:34.038124 [debug] [MainThread]: Connection 'macro_stage_external_sources' was properly closed.
�[0m22:08:34.038358 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x100661b90>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x121e12450>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x121e05a50>]}
�[0m22:08:34.038625 [debug] [MainThread]: Flushing usage events
�[0m22:08:35.236076 [debug] [MainThread]: An error was encountered while trying to flush usage events
System information
The contents of your packages.yml
file:
packages:
- package: dbt-labs/redshift
version: 0.9.0
- package: dbt-labs/codegen
version: 0.13.1
- package: dbt-labs/dbt_utils
version: 1.3.0
- package: starburstdata/trino_utils
version: 0.6.0
- package: dbt-labs/dbt_external_tables
version: 0.11.1
Which database are you using dbt with?
- redshift
- snowflake
- other (specify:Spark in EMR)
The output of dbt --version
:
1.8.9
The operating system you're using:
MAC
The output of python --version
:
Python 3.11.11
Additional context
None