Skip to content

FLINK-5725: Add extra Flink details to paasta status #4063

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
8ec0c76
FLINK-5725: Add extra Flink details to paasta status
nleigh May 13, 2025
fab0eb5
FLINK-5725: Add extra Flink details to paasta status: git repo's
nleigh May 13, 2025
28381c5
FLINK-5725: Add extra Flink details to paasta status: log commands
nleigh May 13, 2025
1fb40c6
FLINK-5725: Add extra Flink details to paasta status: flink monitoring
nleigh May 13, 2025
56cdcc5
FLINK-5725: Add SUPPERREGION_TO_ECOSYSTEM_MAPPINGS
nleigh May 14, 2025
02c4363
FLINK-5725: Update souregraph to github link
nleigh May 14, 2025
7806ecf
FLINK-5725: Add owner information to flink paasta status verbose
nleigh May 14, 2025
ba32de6
FLINK-5725: Refactors
nleigh May 14, 2025
2a65cc6
FLINK-5725: Add flink pool information to flink paasta status verbose
nleigh May 14, 2025
fa93667
FLINK-5725: Add runbook information to flink paasta status verbose
nleigh May 14, 2025
16a304f
FLINK-5725: Fix missing return statement error
nleigh May 15, 2025
06319d8
FLINK-5725: Add Flink cost link to paasta status -v
nleigh May 16, 2025
7357296
Merge branch 'master' into u/nathanleigh/FLINK-5725/AddMoreDetailsToF…
nleigh May 23, 2025
024fdcc
FLINK-5725: Update yelp region -> ecosystem mapping logic
nleigh May 30, 2025
1718cb1
Merge branch 'master' into u/nathanleigh/FLINK-5725/AddMoreDetailsToF…
nleigh May 30, 2025
ce4de9f
FLINK-5725: Downgrade environment-tools
nleigh May 30, 2025
141a8ae
Merge remote-tracking branch 'origin/u/nathanleigh/FLINK-5725/AddMore…
nleigh May 30, 2025
d78be12
FLINK-5725: Mock convert_location_type return
nleigh May 30, 2025
2720594
FLINK-5725: Use 'fake-cluster' name in tests
nleigh Jun 4, 2025
d42bc6e
Update requirements-minimal.txt
nleigh Jun 4, 2025
c48c05b
FLINK-5725: Use existing helper functions
nleigh Jun 5, 2025
379592a
FLINK-5725: Use existing helper functions 2
nleigh Jun 5, 2025
8f16b33
Update paasta_tools/cli/cmds/status.py
nleigh Jun 5, 2025
4cf48ba
Merge branch 'master' into u/nathanleigh/FLINK-5725/AddMoreDetailsToF…
nleigh Jun 5, 2025
1e89fd5
Update paasta_tools/cli/cmds/status.py
nleigh Jun 5, 2025
68b42b6
FLINK-5725: Refactors and remove try/exception
nleigh Jun 5, 2025
da0226f
FLINK-5725: Move ecosytem function to utils
nleigh Jun 5, 2025
d7ac273
FLINK-5725: Fix tox issues
nleigh Jun 5, 2025
994e4d6
Update paasta_tools/flink_tools.py
nleigh Jun 5, 2025
33b03fa
Update paasta_tools/utils.py
nleigh Jun 5, 2025
022b9c4
FLINK-5725: Rename fake-cluster to fake_cluster
nleigh Jun 5, 2025
ffab533
Merge remote-tracking branch 'origin/u/nathanleigh/FLINK-5725/AddMore…
nleigh Jun 5, 2025
42fc583
Update paasta_tools/utils.py
nleigh Jun 6, 2025
571a4cd
Update paasta_tools/utils.py
nleigh Jun 6, 2025
377584d
Update paasta_tools/utils.py
nleigh Jun 6, 2025
22d2921
Update paasta_tools/cli/cmds/status.py
nleigh Jun 6, 2025
dbc5e66
Update paasta_tools/utils.py
nleigh Jun 6, 2025
e0388bb
FLINK-5725: Update test mock
nleigh Jun 6, 2025
4a93fdf
FLINK-5725: Fix tox issues
nleigh Jun 6, 2025
7138c66
Merge branch 'master' into u/nathanleigh/FLINK-5725/AddMoreDetailsToF…
nleigh Jun 10, 2025
399c0ad
Update paasta_tools/cli/cmds/status.py
nleigh Jun 10, 2025
b71546c
Update tests/test_utils.py
nleigh Jun 10, 2025
d5e6765
Update tests/test_utils.py
nleigh Jun 10, 2025
5bf589d
Update paasta_tools/cli/cmds/status.py
nleigh Jun 10, 2025
d946ba3
FLINK-5725: Fix indentation + refactor
nleigh Jun 10, 2025
c1792da
FLINK-5725: Remove if statement check
nleigh Jun 10, 2025
4ba30b3
FLINK-5725: Fix tests by populating flink_instance_config
nleigh Jun 10, 2025
55b4371
Update tests/test_utils.py
nleigh Jun 11, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 52 additions & 2 deletions paasta_tools/cli/cmds/status.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@
from paasta_tools.utils import load_system_paasta_config
from paasta_tools.utils import PaastaColors
from paasta_tools.utils import remove_ansi_escape_sequences
from paasta_tools.utils import SUPPERREGION_TO_ECOSYSTEM_MAPPINGS
from paasta_tools.utils import SystemPaastaConfig

FLINK_STATUS_MAX_THREAD_POOL_WORKERS = 50
Expand Down Expand Up @@ -766,6 +767,7 @@ def append_pod_status(pod_status, output: List[str]):
def _print_flink_status_from_job_manager(
service: str,
instance: str,
cluster: str,
output: List[str],
flink: Mapping[str, Any],
client: PaastaOApiClient,
Expand All @@ -788,6 +790,13 @@ def _print_flink_status_from_job_manager(

output.append(f" Config SHA: {config_sha}")

# Print Flink repo links
if verbose:
output.append(f" Repo(git): https://github.yelpcorp.com/services/{service}")
output.append(
f" Repo(sourcegraph): https://sourcegraph.yelpcorp.com/services/{service}"
)

if status["state"] == "running":
try:
flink_config = get_flink_config_from_paasta_api_client(
Expand All @@ -809,6 +818,47 @@ def _print_flink_status_from_job_manager(
dashboard_url = metadata["annotations"].get("flink.yelp.com/dashboard_url")
output.append(f" URL: {dashboard_url}/")

# Get ecosystem from mapping or default to "prod" if not found
ecosystem = SUPPERREGION_TO_ECOSYSTEM_MAPPINGS.get(cluster, "prod")

# Print Flink config link resources
if verbose:
output.append(
f" Yelpsoa configs: https://github.yelpcorp.com/sysgit/yelpsoa-configs/tree/master/{service}"
)
output.append(
f" Srv configs: https://github.yelpcorp.com/sysgit/srv-configs/tree/master/ecosystem/{ecosystem}/{service}"
)

# Print Flink Log Commands
if verbose:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a bunch of if verbose: blocks in a row - should these be a single block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, it was more for just formatting/separating
Tbf it's getting to the point where could maybe refactor whole function, will consider it in a new PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can separate with comments inside a single if verbose: block :)

that said ++ to more refactoring in another PR

output.append(f" Flink Log Commands:")
output.append(
f" Service: paasta logs -a 1h -c {cluster} -s {service} -i {instance}"
)
output.append(
f" Taskmanager: paasta logs -a 1h -c {cluster} -s {service} -i {instance}.TASKMANAGER"
)
output.append(
f" Jobmanager: paasta logs -a 1h -c {cluster} -s {service} -i {instance}.JOBMANAGER"
)
output.append(
f" Supervisor: paasta logs -a 1h -c {cluster} -s {service} -i {instance}.SUPERVISOR"
)

# Print Flink Metrics Links
if verbose:
output.append(f" Flink Monitoring:")
output.append(
f" Job Metrics: https://grafana.yelpcorp.com/d/flink-metrics/flink-job-metrics?orgId=1&var-datasource=Prometheus-flink&var-region=uswest2-{ecosystem}&var-service={service}&var-instance={instance}&var-job=All&from=now-24h&to=now"
)
output.append(
f" Container Metrics: https://grafana.yelpcorp.com/d/flink-container-metrics/flink-container-metrics?orgId=1&var-datasource=Prometheus-flink&var-region=uswest2-{ecosystem}&var-service={service}&var-instance={instance}&from=now-24h&to=now"
)
output.append(
f" JVM Metrics: https://grafana.yelpcorp.com/d/flink-jvm-metrics/flink-jvm-metrics?orgId=1&var-datasource=Prometheus-flink&var-region=uswest2-{ecosystem}&var-service={service}&var-instance={instance}&from=now-24h&to=now"
)

color = PaastaColors.green if status["state"] == "running" else PaastaColors.yellow
output.append(f" State: {color(status['state'].title())}")

Expand Down Expand Up @@ -988,7 +1038,7 @@ def print_flink_status(
return 1

return _print_flink_status_from_job_manager(
service, instance, output, flink, client, verbose
service, instance, cluster, output, flink, client, verbose
)


Expand All @@ -1015,7 +1065,7 @@ def print_flinkeks_status(
return 1

return _print_flink_status_from_job_manager(
service, instance, output, flink, client, verbose
service, instance, cluster, output, flink, client, verbose
)


Expand Down
10 changes: 10 additions & 0 deletions paasta_tools/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,16 @@
"SETFCAP",
]

# https://github.yelpcorp.com/sysgit/srv-configs/tree/master/superregion
SUPPERREGION_TO_ECOSYSTEM_MAPPINGS = {
Copy link
Preview

Copilot AI May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constant name 'SUPPERREGION_TO_ECOSYSTEM_MAPPINGS' appears to have a misspelling; consider renaming it to 'SUPERREGION_TO_ECOSYSTEM_MAPPINGS' for clarity.

Suggested change
SUPPERREGION_TO_ECOSYSTEM_MAPPINGS = {
SUPERREGION_TO_ECOSYSTEM_MAPPINGS = {

Copilot uses AI. Check for mistakes.

"norcal-devc": "devc",
"norcal-stagef": "stagef",
"norcal-stageg": "stageg",
"nova-prod": "prod",
"pnw-devc": "devc",
"pnw-prod": "prod",
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably be accessed through SystemPaastaConfig - i.e., live in puppet

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, this approach will still fail for infrastage - not all clusters are named after a superregion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nemacysts we only have 1 service/instance in infrastage that we use for testing
https://sourcegraph.yelpcorp.com/search?q=repo:%5Esysgit/yelpsoa-configs%24+flinkeks-infrastage&patternType=keyword&sm=0

Infrastage would require a bit of extra logic,(srv and soa links don't follow pattern) so I decided to remove it

I am not sure what I would be accessing via SystemPaastaConfig
https://sourcegraph.yelpcorp.com/search?q=repo:%5EYelp/paasta%24+SystemPaastaConfigDict%28&patternType=keyword&sm=0
SystemPaastaConfigDict(
https://sourcegraph.yelpcorp.com/Yelp/paasta/-/blob/paasta_tools/utils.py?L1948-1950



class RollbackTypes(Enum):
AUTOMATIC_SLO_ROLLBACK = "automatic_slo_rollback"
Expand Down
45 changes: 42 additions & 3 deletions tests/cli/test_cmds_status.py
Original file line number Diff line number Diff line change
Expand Up @@ -2702,7 +2702,7 @@ def test_output_stopping_jobmanager(
output = []
mock_flink_status["status"]["state"] = "Stoppingjobmanager"
print_flink_status(
cluster="fake_cluster",
cluster="pnw-devc",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once the mapping is moved to SPC, we can probably mock the getter and have a fake ecosystem for fake_cluster

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nleigh we still wanna do this: using a non-existent cluster ensures that if folks make a mistake wrt their mocks, we never actually hit a real paasta api

i guess for now this is still technically safe since we don't have any service called fake_service - but it would definitely make things safer (both for the existing tests and for folks copying this and potentially adding real service names/instances in new tests)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see thanks
Updated in 2720594

service="fake_service",
instance="fake_instance",
output=output,
Expand All @@ -2712,6 +2712,19 @@ def test_output_stopping_jobmanager(
status = mock_flink_status["status"]
expected_output = [
f" Config SHA: 00000",
f" Repo(git): https://github.yelpcorp.com/services/fake_service",
f" Repo(sourcegraph): https://sourcegraph.yelpcorp.com/services/fake_service",
f" Yelpsoa configs: https://github.yelpcorp.com/sysgit/yelpsoa-configs/tree/master/fake_service",
f" Srv configs: https://github.yelpcorp.com/sysgit/srv-configs/tree/master/ecosystem/devc/fake_service",
f" Flink Log Commands:",
f" Service: paasta logs -a 1h -c pnw-devc -s fake_service -i fake_instance",
f" Taskmanager: paasta logs -a 1h -c pnw-devc -s fake_service -i fake_instance.TASKMANAGER",
f" Jobmanager: paasta logs -a 1h -c pnw-devc -s fake_service -i fake_instance.JOBMANAGER",
f" Supervisor: paasta logs -a 1h -c pnw-devc -s fake_service -i fake_instance.SUPERVISOR",
f" Flink Monitoring:",
f" Job Metrics: https://grafana.yelpcorp.com/d/flink-metrics/flink-job-metrics?orgId=1&var-datasource=Prometheus-flink&var-region=uswest2-devc&var-service=fake_service&var-instance=fake_instance&var-job=All&from=now-24h&to=now",
f" Container Metrics: https://grafana.yelpcorp.com/d/flink-container-metrics/flink-container-metrics?orgId=1&var-datasource=Prometheus-flink&var-region=uswest2-devc&var-service=fake_service&var-instance=fake_instance&from=now-24h&to=now",
f" JVM Metrics: https://grafana.yelpcorp.com/d/flink-jvm-metrics/flink-jvm-metrics?orgId=1&var-datasource=Prometheus-flink&var-region=uswest2-devc&var-service=fake_service&var-instance=fake_instance&from=now-24h&to=now",
f" State: {PaastaColors.yellow(status['state'].title())}",
f" Pods: 3 running, 0 evicted, 0 other",
]
Expand Down Expand Up @@ -2745,7 +2758,7 @@ def test_output_stopping_taskmanagers(
"pod_status"
][2:]
print_flink_status(
cluster="fake_cluster",
cluster="pnw-devc",
service="fake_service",
instance="fake_instance",
output=output,
Expand All @@ -2755,6 +2768,19 @@ def test_output_stopping_taskmanagers(
status = mock_flink_status["status"]
expected_output = [
f" Config SHA: 00000",
f" Repo(git): https://github.yelpcorp.com/services/fake_service",
f" Repo(sourcegraph): https://sourcegraph.yelpcorp.com/services/fake_service",
f" Yelpsoa configs: https://github.yelpcorp.com/sysgit/yelpsoa-configs/tree/master/fake_service",
f" Srv configs: https://github.yelpcorp.com/sysgit/srv-configs/tree/master/ecosystem/devc/fake_service",
f" Flink Log Commands:",
f" Service: paasta logs -a 1h -c pnw-devc -s fake_service -i fake_instance",
f" Taskmanager: paasta logs -a 1h -c pnw-devc -s fake_service -i fake_instance.TASKMANAGER",
f" Jobmanager: paasta logs -a 1h -c pnw-devc -s fake_service -i fake_instance.JOBMANAGER",
f" Supervisor: paasta logs -a 1h -c pnw-devc -s fake_service -i fake_instance.SUPERVISOR",
f" Flink Monitoring:",
f" Job Metrics: https://grafana.yelpcorp.com/d/flink-metrics/flink-job-metrics?orgId=1&var-datasource=Prometheus-flink&var-region=uswest2-devc&var-service=fake_service&var-instance=fake_instance&var-job=All&from=now-24h&to=now",
f" Container Metrics: https://grafana.yelpcorp.com/d/flink-container-metrics/flink-container-metrics?orgId=1&var-datasource=Prometheus-flink&var-region=uswest2-devc&var-service=fake_service&var-instance=fake_instance&from=now-24h&to=now",
f" JVM Metrics: https://grafana.yelpcorp.com/d/flink-jvm-metrics/flink-jvm-metrics?orgId=1&var-datasource=Prometheus-flink&var-region=uswest2-devc&var-service=fake_service&var-instance=fake_instance&from=now-24h&to=now",
f" State: {PaastaColors.yellow(status['state'].title())}",
f" Pods: 1 running, 0 evicted, 0 other",
]
Expand Down Expand Up @@ -2784,7 +2810,7 @@ def test_output_1_verbose(
mock_naturaltime.return_value = "one day ago"
output = []
print_flink_status(
cluster="fake_cluster",
cluster="pnw-devc",
service="fake_service",
instance="fake_instance",
output=output,
Expand All @@ -2798,6 +2824,17 @@ def test_output_1_verbose(
datetime.datetime.fromtimestamp(int(job_details_obj.start_time) // 1000)
)
expected_output = _get_base_status_verbose_1(metadata) + [
f" Yelpsoa configs: https://github.yelpcorp.com/sysgit/yelpsoa-configs/tree/master/fake_service",
f" Srv configs: https://github.yelpcorp.com/sysgit/srv-configs/tree/master/ecosystem/devc/fake_service",
f" Flink Log Commands:",
f" Service: paasta logs -a 1h -c pnw-devc -s fake_service -i fake_instance",
f" Taskmanager: paasta logs -a 1h -c pnw-devc -s fake_service -i fake_instance.TASKMANAGER",
f" Jobmanager: paasta logs -a 1h -c pnw-devc -s fake_service -i fake_instance.JOBMANAGER",
f" Supervisor: paasta logs -a 1h -c pnw-devc -s fake_service -i fake_instance.SUPERVISOR",
f" Flink Monitoring:",
f" Job Metrics: https://grafana.yelpcorp.com/d/flink-metrics/flink-job-metrics?orgId=1&var-datasource=Prometheus-flink&var-region=uswest2-devc&var-service=fake_service&var-instance=fake_instance&var-job=All&from=now-24h&to=now",
f" Container Metrics: https://grafana.yelpcorp.com/d/flink-container-metrics/flink-container-metrics?orgId=1&var-datasource=Prometheus-flink&var-region=uswest2-devc&var-service=fake_service&var-instance=fake_instance&from=now-24h&to=now",
f" JVM Metrics: https://grafana.yelpcorp.com/d/flink-jvm-metrics/flink-jvm-metrics?orgId=1&var-datasource=Prometheus-flink&var-region=uswest2-devc&var-service=fake_service&var-instance=fake_instance&from=now-24h&to=now",
f" State: {PaastaColors.green(status['state'].title())}",
f" Pods: 3 running, 0 evicted, 0 other",
f" Jobs: 1 running, 0 finished, 0 failed, 0 cancelled",
Expand Down Expand Up @@ -2859,6 +2896,8 @@ def _get_base_status_verbose_0(metadata):
def _get_base_status_verbose_1(metadata):
return [
f" Config SHA: 00000",
f" Repo(git): https://github.yelpcorp.com/services/fake_service",
f" Repo(sourcegraph): https://sourcegraph.yelpcorp.com/services/fake_service",
f" Flink version: {config_obj.flink_version} {config_obj.flink_revision}",
f" URL: {metadata['annotations']['flink.yelp.com/dashboard_url']}/",
]
Expand Down