-
Notifications
You must be signed in to change notification settings - Fork 15.9k
Fix CloudwatchTaskHandler display error #54054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
jason810496
wants to merge
11
commits into
apache:main
Choose a base branch
from
jason810496:fix/logging/cloudwatch-handler-error
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+104
−45
Draft
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
ff72688
Take over ash's fix
jason810496 4c52c03
Fix datetime serialization error
jason810496 98833e5
Fix 'generator is not subscriptable' error
jason810496 9ee0492
Fix test_cloudwatch_task_handler
jason810496 49c28ba
Add .stream method to RemoteIO protocal
jason810496 766ba9d
Fix Airflow Version in LogMessages comment
jason810496 844750b
Fix nits in code review
jason810496 ed48ef1
Add CloudWatchLogEvent type
jason810496 f19ebf1
Correct return type of .stream method
jason810496 21af62b
Revert change in file_task_handler
jason810496 847a238
Fix test_log_message
jason810496 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -22,6 +22,7 @@ | |
| import json | ||
| import logging | ||
| import os | ||
| from collections.abc import Generator | ||
| from datetime import date, datetime, timedelta, timezone | ||
| from functools import cached_property | ||
| from pathlib import Path | ||
|
|
@@ -40,8 +41,15 @@ | |
| import structlog.typing | ||
|
|
||
| from airflow.models.taskinstance import TaskInstance | ||
| from airflow.providers.amazon.aws.hooks.logs import CloudWatchLogEvent | ||
| from airflow.sdk.types import RuntimeTaskInstanceProtocol as RuntimeTI | ||
| from airflow.utils.log.file_task_handler import LogMessages, LogSourceInfo | ||
| from airflow.utils.log.file_task_handler import ( | ||
| LegacyLogResponse, | ||
| LogMessages, | ||
| LogResponse, | ||
| LogSourceInfo, | ||
| RawLogStream, | ||
| ) | ||
|
|
||
|
|
||
| def json_serialize_legacy(value: Any) -> str | None: | ||
|
|
@@ -163,20 +171,35 @@ def upload(self, path: os.PathLike | str, ti: RuntimeTI): | |
| self.close() | ||
| return | ||
|
|
||
| def read(self, relative_path, ti: RuntimeTI) -> tuple[LogSourceInfo, LogMessages | None]: | ||
| logs: LogMessages | None = [] | ||
| def read(self, relative_path: str, ti: RuntimeTI) -> LegacyLogResponse: | ||
| messages, logs = self.stream(relative_path, ti) | ||
| str_logs: list[str] = [] | ||
|
|
||
| for group in logs: | ||
| for msg in group: | ||
| str_logs.append(f"{msg}\n") | ||
|
|
||
| return messages, str_logs | ||
|
|
||
| def stream(self, relative_path: str, ti: RuntimeTI) -> LogResponse: | ||
| logs: list[RawLogStream] = [] | ||
| messages = [ | ||
| f"Reading remote log from Cloudwatch log_group: {self.log_group} log_stream: {relative_path}" | ||
| ] | ||
| try: | ||
| logs = [self.get_cloudwatch_logs(relative_path, ti)] | ||
| gen: RawLogStream = ( | ||
| self._parse_cloudwatch_log_event(event) | ||
| for event in self.get_cloudwatch_logs(relative_path, ti) | ||
| ) | ||
| logs = [gen] | ||
| except Exception as e: | ||
| logs = None | ||
| messages.append(str(e)) | ||
|
|
||
| return messages, logs | ||
|
|
||
| def get_cloudwatch_logs(self, stream_name: str, task_instance: RuntimeTI): | ||
| def get_cloudwatch_logs( | ||
| self, stream_name: str, task_instance: RuntimeTI | ||
| ) -> Generator[CloudWatchLogEvent, None, None]: | ||
| """ | ||
| Return all logs from the given log stream. | ||
|
|
||
|
|
@@ -192,29 +215,22 @@ def get_cloudwatch_logs(self, stream_name: str, task_instance: RuntimeTI): | |
| if (end_date := getattr(task_instance, "end_date", None)) is None | ||
| else datetime_to_epoch_utc_ms(end_date + timedelta(seconds=30)) | ||
| ) | ||
| events = self.hook.get_log_events( | ||
| return self.hook.get_log_events( | ||
| log_group=self.log_group, | ||
| log_stream_name=stream_name, | ||
| end_time=end_time, | ||
| ) | ||
| return "\n".join(self._event_to_str(event) for event in events) | ||
|
|
||
| def _event_to_dict(self, event: dict) -> dict: | ||
| def _parse_cloudwatch_log_event(self, event: CloudWatchLogEvent) -> str: | ||
| event_dt = datetime.fromtimestamp(event["timestamp"] / 1000.0, tz=timezone.utc).isoformat() | ||
| message = event["message"] | ||
| event_msg = event["message"] | ||
| try: | ||
| message = json.loads(message) | ||
| message = json.loads(event_msg) | ||
| message["timestamp"] = event_dt | ||
| return message | ||
| except Exception: | ||
| return {"timestamp": event_dt, "event": message} | ||
| message = {"timestamp": event_dt, "event": event_msg} | ||
|
|
||
| def _event_to_str(self, event: dict) -> str: | ||
| event_dt = datetime.fromtimestamp(event["timestamp"] / 1000.0, tz=timezone.utc) | ||
| # Format a datetime object to a string in Zulu time without milliseconds. | ||
| formatted_event_dt = event_dt.strftime("%Y-%m-%dT%H:%M:%SZ") | ||
| message = event["message"] | ||
| return f"[{formatted_event_dt}] {message}" | ||
| return json.dumps(message) | ||
|
|
||
|
|
||
| class CloudwatchTaskHandler(FileTaskHandler, LoggingMixin): | ||
|
|
@@ -291,4 +307,22 @@ def _read_remote_logs( | |
| ) -> tuple[LogSourceInfo, LogMessages]: | ||
| stream_name = self._render_filename(task_instance, try_number) | ||
| messages, logs = self.io.read(stream_name, task_instance) | ||
| return messages, logs or [] | ||
|
|
||
| messages = [ | ||
| f"Reading remote log from Cloudwatch log_group: {self.io.log_group} log_stream: {stream_name}" | ||
| ] | ||
| try: | ||
| events = self.io.get_cloudwatch_logs(stream_name, task_instance) | ||
| logs = ["\n".join(self._event_to_str(event) for event in events)] | ||
| except Exception as e: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure whehter it's possible to list all possible exception instead of using |
||
| logs = [] | ||
| messages.append(str(e)) | ||
|
|
||
| return messages, logs | ||
|
|
||
| def _event_to_str(self, event: CloudWatchLogEvent) -> str: | ||
| event_dt = datetime.fromtimestamp(event["timestamp"] / 1000.0, tz=timezone.utc) | ||
| # Format a datetime object to a string in Zulu time without milliseconds. | ||
| formatted_event_dt = event_dt.strftime("%Y-%m-%dT%H:%M:%SZ") | ||
| message = event["message"] | ||
| return f"[{formatted_event_dt}] {message}" | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the changes to this file backward compatible with Airflow 2.10? PRs that change both core and providers may hide compatibility issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and I agree.
I will test the CloudWatchHandler with Airflow 2.10 as well later on.