[Core feature] Add PyTorch Memory Profiling Deck Renderer #3105

10sharmashivam · 2025-02-02T17:07:24Z

Tracking issue

Reference Issue

Why are the changes needed?

Huggingface announced a PyTorch memory visualizer that could be valuable for debugging memory-related issues in Flyte tasks, especially for failed executions. This implementation provides an interactive way to visualize PyTorch profiling data directly in Flyte decks, making it easier to diagnose memory issues.

What changes were proposed in this pull request?

Added PyTorchProfilingRenderer class for visualizing PyTorch profiling data
Implemented multiple visualization types:
- Memory usage visualization
- Execution timeline
- Memory segment analysis
- Profile visualization
- Comparison between snapshots
Added comprehensive test suite
Added memory analysis capabilities for failed executions
Added support for loading profiling data from pickle files

How was this patch tested?

Added unit tests for all visualization types
Tested with real profiling data
Verified visualizations in both success and failure scenarios
Added test cases for error handling and edge cases
All tests are passing in the test suite

Setup process

Screenshots

Check all the applicable boxes

[X ] All new and existing tests passed.
[ X] All commits are signed off.

Related PRs

Docs link

Summary by Bito

This PR introduces and enhances the PyTorch Memory Profiling Deck Renderer that enables visualization of memory usage in Flyte tasks. The implementation provides interactive visualizations including memory usage timeline, segment analysis, profile visualization, and snapshot comparisons. The enhancements include comprehensive error handling for pickle deserialization, improved validation of profiling data structures, and robust subprocess execution controls with better error messages and proper temporary file handling.

Unit tests added: True

Estimated effort to review (1-5, lower is better): 4

flyte-bot · 2025-02-02T17:07:38Z

Code Review Agent Run #51e09b

Actionable Suggestions - 6

plugins/flytekit-deck-standard/flytekitplugins/deck/renderer.py - 5
- Consider adding segments key validation · Line 406-407
- Consider moving helper function outside method · Line 396-403
- Consider extracting validation logic · Line 251-268
- Consider splitting long to_html method · Line 357-388
- Use Path API and safe deserialization · Line 446-447
plugins/flytekit-deck-standard/tests/test_pytorch_profiling_renderer.py - 1
- Consider extracting duplicated data transformation logic · Line 37-42

Review Details

Files reviewed - 4 · Commit Range: d7f5e51..d7f5e51
- plugins/flytekit-deck-standard/flytekitplugins/deck/__init__.py
- plugins/flytekit-deck-standard/flytekitplugins/deck/_memory_viz.py
- plugins/flytekit-deck-standard/flytekitplugins/deck/renderer.py
- plugins/flytekit-deck-standard/tests/test_pytorch_profiling_renderer.py
Files skipped - 0
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by

Signed-off-by: 10sharmashivam <[email protected]>

flyte-bot · 2025-02-02T17:16:12Z

Changelist by Bito

This pull request implements the following key changes.

Key Change	Files Impacted
New Feature - PyTorch Memory Profiling Visualization	- `__init__.py` - Added PyTorchProfilingRenderer to exports - `pytorch_memory_viz.py` - Implemented core PyTorch memory visualization functionality with multiple visualization types - `renderer.py` - Added imports for PyTorch memory visualization functions
New Feature - PyTorch Memory Profiling Visualization	- `__init__.py` - Added PyTorchProfilingRenderer to exports - `pytorch_memory_viz.py` - Implemented core PyTorch memory visualization functionality with multiple visualization types - `renderer.py` - Added PyTorchProfilingRenderer class with comprehensive memory profiling capabilities - `test_pytorch_profiling_renderer.py` - Added comprehensive test suite for PyTorch profiling renderer

flyte-bot · 2025-02-02T17:16:13Z

plugins/flytekit-deck-standard/flytekitplugins/deck/renderer.py

+            after = ensure_structure(after)
+            content = compare(before["segments"], after["segments"])


Consider adding segments key validation

The compare plot type implementation could benefit from error handling for malformed data structures. Consider adding validation for the segments key before accessing it.

Code suggestion

Check the AI-generated fix before applying

Suggested change

after = ensure_structure(after)

content = compare(before["segments"], after["segments"])

after = ensure_structure(after)

if 'segments' not in before or 'segments' not in after:

raise ValueError("Both before and after snapshots must contain 'segments' data")

content = compare(before["segments"], after["segments"])

Code Review Run #51e09b

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

flyte-bot · 2025-02-02T17:16:14Z

plugins/flytekit-deck-standard/flytekitplugins/deck/renderer.py

+            def ensure_structure(data):
+                if isinstance(data, dict) and "segments" in data:
+                    return data
+                return {
+                    "segments": data.get("segments", []) if hasattr(data, "get") else [],
+                    "traces": data.get("traces", []) if hasattr(data, "get") else [],
+                    "allocator_settings": data.get("allocator_settings", {}) if hasattr(data, "get") else {}
+                }


Consider moving helper function outside method

The ensure_structure helper function could be moved outside the method scope since it doesn't use any instance variables.

Code suggestion

Check the AI-generated fix before applying

Suggested change

def ensure_structure(data):

if isinstance(data, dict) and "segments" in data:

return data

return {

"segments": data.get("segments", []) if hasattr(data, "get") else [],

"traces": data.get("traces", []) if hasattr(data, "get") else [],

"allocator_settings": data.get("allocator_settings", {}) if hasattr(data, "get") else {}

}

def ensure_structure(data):

if isinstance(data, dict) and "segments" in data:

return data

return {

"segments": data.get("segments", []) if hasattr(data, "get") else [],

"traces": data.get("traces", []) if hasattr(data, "get") else [],

"allocator_settings": data.get("allocator_settings", {}) if hasattr(data, "get") else {}

}

Code Review Run #51e09b

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

flyte-bot · 2025-02-02T17:16:15Z

plugins/flytekit-deck-standard/flytekitplugins/deck/renderer.py

+    def __init__(self, profiling_data):
+        if profiling_data is None:
+            raise ValueError("Profiling data cannot be None")
+
+        # Handle both single snapshot and comparison cases
+        if isinstance(profiling_data, tuple):
+            before, after = profiling_data
+            if before is None or after is None:
+                raise ValueError("Both before and after snapshots must be provided for comparison")
+
+            # Check if this might be an OOM case by comparing memory usage
+            try:
+                self._check_memory_growth(before, after)
+            except Exception:
+                # Don't fail initialization if memory check fails
+                pass
+
+        self.profiling_data = profiling_data


Consider extracting validation logic

The __init__ method could benefit from extracting the profiling data validation logic into a separate method for better organization and reusability.

Code suggestion

Check the AI-generated fix before applying

Suggested change

def __init__(self, profiling_data):

if profiling_data is None:

raise ValueError("Profiling data cannot be None")

# Handle both single snapshot and comparison cases

if isinstance(profiling_data, tuple):

before, after = profiling_data

if before is None or after is None:

raise ValueError("Both before and after snapshots must be provided for comparison")

# Check if this might be an OOM case by comparing memory usage

try:

self._check_memory_growth(before, after)

except Exception:

# Don't fail initialization if memory check fails

pass

self.profiling_data = profiling_data

def _validate_profiling_data(self, data):

if data is None:

raise ValueError("Profiling data cannot be None")

if isinstance(data, tuple):

before, after = data

if before is None or after is None:

raise ValueError("Both before and after snapshots must be provided for comparison")

try:

self._check_memory_growth(before, after)

except Exception:

pass

return data

def __init__(self, profiling_data):

self.profiling_data = self._validate_profiling_data(profiling_data)

Code Review Run #51e09b

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

flyte-bot · 2025-02-02T17:16:16Z

plugins/flytekit-deck-standard/flytekitplugins/deck/renderer.py

+    def to_html(self, plot_type: str = "trace_plot") -> str:
+        """Convert profiling data to HTML visualization."""
+
+        # Define memory_viz_js at the start so it's available for all branches
+        memory_viz_js = """
+        <script src="MemoryViz.js" type="text/javascript"></script>
+        <script type="text/javascript">
+            function init(evt) {
+                if (window.svgDocument == null) {
+                    svgDocument = evt.target.ownerDocument;
+                }
+            }
+        </script>
+        """
+
+        if plot_type == "profile_plot":
+            import torch
+            try:
+                # Create a profile object without initializing it
+                profile = torch.profiler.profile()
+                # Set basic attributes needed for visualization
+                profile.steps = []
+                profile.events = []
+                profile.key_averages = []
+
+                # Copy the data from our profiling_data
+                if isinstance(self.profiling_data, dict):
+                    for key, value in self.profiling_data.items():
+                        setattr(profile, key, value)
+                content = profile_plot(profile)
+            except Exception as e:
+                content = f"<div>Failed to generate profile plot: {str(e)}</div>"


Consider splitting long to_html method

The to_html method appears to be quite long and handles multiple visualization types. Consider splitting this into separate methods for each visualization type to improve maintainability and readability.

Code suggestion

Check the AI-generated fix before applying

Suggested change

def to_html(self, plot_type: str = "trace_plot") -> str:

"""Convert profiling data to HTML visualization."""

# Define memory_viz_js at the start so it's available for all branches

memory_viz_js = """

<script src="MemoryViz.js" type="text/javascript"></script>

<script type="text/javascript">

function init(evt) {

if (window.svgDocument == null) {

svgDocument = evt.target.ownerDocument;

}

}

</script>

"""

if plot_type == "profile_plot":

import torch

try:

# Create a profile object without initializing it

profile = torch.profiler.profile()

# Set basic attributes needed for visualization

profile.steps = []

profile.events = []

profile.key_averages = []

# Copy the data from our profiling_data

if isinstance(self.profiling_data, dict):

for key, value in self.profiling_data.items():

setattr(profile, key, value)

content = profile_plot(profile)

except Exception as e:

content = f"<div>Failed to generate profile plot: {str(e)}</div>"

def _get_memory_viz_js(self) -> str:

return """

<script src="MemoryViz.js" type="text/javascript"></script>

<script type="text/javascript">

function init(evt) {

if (window.svgDocument == null) {

svgDocument = evt.target.ownerDocument;

}

}

</script>

"""

def _render_profile_plot(self) -> str:

import torch

try:

profile = torch.profiler.profile()

profile.steps = []

profile.events = []

profile.key_averages = []

if isinstance(self.profiling_data, dict):

for key, value in self.profiling_data.items():

setattr(profile, key, value)

return profile_plot(profile)

except Exception as e:

return f"<div>Failed to generate profile plot: {str(e)}</div>"

def to_html(self, plot_type: str = "trace_plot") -> str:

"""Convert profiling data to HTML visualization."""

memory_viz_js = self._get_memory_viz_js()

if plot_type == "profile_plot":

content = self._render_profile_plot()"

Code Review Run #51e09b

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

flyte-bot · 2025-02-02T17:16:17Z

plugins/flytekit-deck-standard/flytekitplugins/deck/renderer.py

+            with open(file_path, "rb") as f:
+                profiling_data = pickle.load(f)


Use Path API and safe deserialization

Use 'Path.open()' instead of built-in 'open()' for file operations and handle pickle safely.

Code suggestion

Check the AI-generated fix before applying

Suggested change

with open(file_path, "rb") as f:

profiling_data = pickle.load(f)

from pathlib import Path

with Path(file_path).open("rb") as f:

# TODO: Add pickle safety validation before loading

profiling_data = pickle.load(f)

Code Review Run #51e09b

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

flyte-bot · 2025-02-02T17:16:20Z

plugins/flytekit-deck-standard/tests/test_pytorch_profiling_renderer.py

+    if plot_type in ["memory", "segments"]:
+        profiling_data = {
+            "segments": profiling_data.get("segments", []),
+            "traces": profiling_data.get("traces", []),
+            "allocator_settings": profiling_data.get("allocator_settings", {})
+        }


Consider extracting duplicated data transformation logic

Consider extracting the data structure transformation logic into a helper function since it's duplicated in multiple places. The same structure is created again on lines 130-134.

Code suggestion

Check the AI-generated fix before applying

- profiling_data = { - "segments": profiling_data.get("segments", []), - "traces": profiling_data.get("traces", []), - "allocator_settings": profiling_data.get("allocator_settings", {}) - } + profiling_data = _transform_profiling_data(profiling_data) @@ -130,8 +126,13 @@ - after = { - "segments": profiling_data.get("segments", []) if hasattr(profiling_data, "get") else [], - "traces": profiling_data.get("traces", []) if hasattr(profiling_data, "get") else [], - "allocator_settings": profiling_data.get("allocator_settings", {}) if hasattr(profiling_data, "get") else {} - } + after = _transform_profiling_data(profiling_data) + +def _transform_profiling_data(data): + return { + "segments": data.get("segments", []) if hasattr(data, "get") else [], + "traces": data.get("traces", []) if hasattr(data, "get") else [], + "allocator_settings": data.get("allocator_settings", {}) if hasattr(data, "get") else {} + }

Code Review Run #51e09b

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

flyte-bot · 2025-02-02T18:14:58Z

Code Review Agent Run #64d559

Actionable Suggestions - 5

plugins/flytekit-deck-standard/flytekitplugins/deck/pytorch_memory_viz.py - 4
- Consider adding error handling for downloads · Line 95-101
- Use platform-independent temp directory path · Line 92-92
- Consider simplifying subprocess handling · Line 103-113
- Consider consolidating duplicate stream variable definitions · Line 295-298
plugins/flytekit-deck-standard/tests/test_pytorch_profiling_renderer.py - 1
- Consider using tempfile for temp files · Line 93-104

Review Details

Files reviewed - 4 · Commit Range: 38bb5aa..3f8bf6d
- plugins/flytekit-deck-standard/flytekitplugins/deck/__init__.py
- plugins/flytekit-deck-standard/flytekitplugins/deck/pytorch_memory_viz.py
- plugins/flytekit-deck-standard/flytekitplugins/deck/renderer.py
- plugins/flytekit-deck-standard/tests/test_pytorch_profiling_renderer.py
Files skipped - 0
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by

flyte-bot · 2025-02-02T18:22:45Z

plugins/flytekit-deck-standard/flytekitplugins/deck/pytorch_memory_viz.py

+
+        print(f"Downloading flamegraph.pl to: {flamegraph_script}")
+        urllib.request.urlretrieve(
+            "https://raw.githubusercontent.com/brendangregg/FlameGraph/master/flamegraph.pl",
+            flamegraph_script,
+        )
+        subprocess.check_call(["chmod", "+x", flamegraph_script])


Consider adding error handling for downloads

Consider adding error handling for the urllib.request.urlretrieve() call as network requests can fail. Also, the script download could fail due to permission issues when writing to /tmp.

Code suggestion

Check the AI-generated fix before applying

Suggested change

print(f"Downloading flamegraph.pl to: {flamegraph_script}")

urllib.request.urlretrieve(

"https://raw.githubusercontent.com/brendangregg/FlameGraph/master/flamegraph.pl",

flamegraph_script,

)

subprocess.check_call(["chmod", "+x", flamegraph_script])

try:

print(f"Downloading flamegraph.pl to: {flamegraph_script}")

urllib.request.urlretrieve(

"https://raw.githubusercontent.com/brendangregg/FlameGraph/master/flamegraph.pl",

flamegraph_script,

)

subprocess.check_call(["chmod", "+x", flamegraph_script])

except (urllib.error.URLError, subprocess.CalledProcessError, IOError) as e:

raise RuntimeError(f"Failed to download or setup flamegraph script: {str(e)}")

Code Review Run #64d559

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

flyte-bot · 2025-02-02T18:22:46Z

plugins/flytekit-deck-standard/flytekitplugins/deck/pytorch_memory_viz.py

+
+def format_flamegraph(flamegraph_lines, flamegraph_script=None):
+    if flamegraph_script is None:
+        flamegraph_script = f"/tmp/{os.getuid()}_flamegraph.pl"


Use platform-independent temp directory path

Consider using tempfile.gettempdir() instead of hardcoding /tmp for better cross-platform compatibility. The current implementation may fail on Windows systems.

Code suggestion

Check the AI-generated fix before applying

Suggested change

flamegraph_script = f"/tmp/{os.getuid()}_flamegraph.pl"

import tempfile

flamegraph_script = os.path.join(tempfile.gettempdir(), f"{os.getuid()}_flamegraph.pl")

Code Review Run #64d559

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

flyte-bot · 2025-02-02T18:22:47Z

plugins/flytekit-deck-standard/flytekitplugins/deck/pytorch_memory_viz.py

+    p = subprocess.Popen(
+        args, stdin=subprocess.PIPE, stdout=subprocess.PIPE, encoding="utf-8"
+    )
+    assert p.stdin is not None
+    assert p.stdout is not None
+    p.stdin.write(flamegraph_lines)
+    p.stdin.close()
+    result = p.stdout.read()
+    p.stdout.close()
+    p.wait()
+    assert p.wait() == 0


Consider simplifying subprocess handling

The subprocess handling could be simplified using subprocess.run() with check=True instead of manual pipe handling and assertions

Code suggestion

Check the AI-generated fix before applying

Suggested change

p = subprocess.Popen(

args, stdin=subprocess.PIPE, stdout=subprocess.PIPE, encoding="utf-8"

)

assert p.stdin is not None

assert p.stdout is not None

p.stdin.write(flamegraph_lines)

p.stdin.close()

result = p.stdout.read()

p.stdout.close()

p.wait()

assert p.wait() == 0

result = subprocess.run(

args, input=flamegraph_lines, text=True, capture_output=True, check=True

).stdout

Code Review Run #64d559

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

flyte-bot · 2025-02-02T18:22:47Z

plugins/flytekit-deck-standard/flytekitplugins/deck/pytorch_memory_viz.py

+        stream = "" if seg["stream"] == 0 else f', stream_{seg["stream"]}'
+        body = "".join(occupied)
+        assert (
+            seg_free_external + seg_free_internal + seg_allocated == seg["total_size"]


Consider consolidating duplicate stream variable definitions

The variable stream is defined twice with similar logic but slightly different formatting. Consider consolidating into a single definition.

Code suggestion

Check the AI-generated fix before applying

Suggested change

stream = "" if seg["stream"] == 0 else f', stream_{seg["stream"]}'

body = "".join(occupied)

assert (

seg_free_external + seg_free_internal + seg_allocated == seg["total_size"]

body = "".join(occupied)

assert (

seg_free_external + seg_free_internal + seg_allocated == seg["total_size"]

Code Review Run #64d559

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

flyte-bot · 2025-02-02T18:22:48Z

plugins/flytekit-deck-standard/tests/test_pytorch_profiling_renderer.py

+    invalid_file_path = os.path.join(CURRENT_DIR, "invalid.pkl")
+    with open(invalid_file_path, "w") as f:
+        f.write("not a pickle file")
+
+    try:
+        invalid_file = FlyteFile(invalid_file_path)
+        with pytest.raises(ValueError, match="Failed to load profiling data"):
+            render_pytorch_profiling(invalid_file)
+    finally:
+        # Clean up the temporary file
+        if os.path.exists(invalid_file_path):
+            os.remove(invalid_file_path)


Consider using tempfile for temp files

Consider using a context manager pattern with tempfile module for handling temporary files. This ensures proper cleanup even if exceptions occur.

Code suggestion

Check the AI-generated fix before applying

Suggested change

invalid_file_path = os.path.join(CURRENT_DIR, "invalid.pkl")

with open(invalid_file_path, "w") as f:

f.write("not a pickle file")

try:

invalid_file = FlyteFile(invalid_file_path)

with pytest.raises(ValueError, match="Failed to load profiling data"):

render_pytorch_profiling(invalid_file)

finally:

# Clean up the temporary file

if os.path.exists(invalid_file_path):

os.remove(invalid_file_path)

with tempfile.NamedTemporaryFile(suffix='.pkl', mode='w') as temp_file:

temp_file.write("not a pickle file")

temp_file.flush()

invalid_file = FlyteFile(temp_file.name)

with pytest.raises(ValueError, match="Failed to load profiling data"):

Code Review Run #64d559

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

codecov · 2025-02-02T19:19:49Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.16%. Comparing base (5607b0d) to head (3f8bf6d).

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #3105      +/-   ##
==========================================
- Coverage   80.12%   76.16%   -3.97%     
==========================================
  Files         272      202      -70     
  Lines       24614    21472    -3142     
  Branches     2768     2768              
==========================================
- Hits        19723    16354    -3369     
- Misses       4083     4304     +221     
- Partials      808      814       +6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: 10sharmashivam <[email protected]>

flyte-bot · 2025-02-03T06:47:44Z

Code Review Agent Run #c4efcc

Actionable Suggestions - 5

plugins/flytekit-deck-standard/flytekitplugins/deck/renderer.py - 3
- Consider exception handling in initialization · Line 288-289
- Consider more specific error handling · Line 441-446
- Consider more specific pickle error handling · Line 502-505
plugins/flytekit-deck-standard/flytekitplugins/deck/pytorch_memory_viz.py - 2
- Add timeout to subprocess execution · Line 126-136
- Consider more descriptive error message · Line 326-329

Review Details

Files reviewed - 3 · Commit Range: 3f8bf6d..29e89b6
- plugins/flytekit-deck-standard/flytekitplugins/deck/pytorch_memory_viz.py
- plugins/flytekit-deck-standard/flytekitplugins/deck/renderer.py
- plugins/flytekit-deck-standard/tests/test_pytorch_profiling_renderer.py
Files skipped - 0
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by

flyte-bot · 2025-02-03T06:54:10Z

plugins/flytekit-deck-standard/flytekitplugins/deck/renderer.py

+        self._validate_profiling_data(profiling_data)
+        self.profiling_data = profiling_data


Consider exception handling in initialization

Consider handling potential exceptions from _validate_profiling_data() in __init__(). Currently, if validation fails, the instance variable self.profiling_data might remain uninitialized.

Code suggestion

Check the AI-generated fix before applying

Suggested change

self._validate_profiling_data(profiling_data)

self.profiling_data = profiling_data

self.profiling_data = profiling_data

try:

self._validate_profiling_data(profiling_data)

except ValueError:

self.profiling_data = None

Code Review Run #c4efcc

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

flyte-bot · 2025-02-03T06:54:11Z

plugins/flytekit-deck-standard/flytekitplugins/deck/renderer.py

+            try:
+                before = _ensure_profiling_structure(before)
+                after = _ensure_profiling_structure(after)
+                content = compare(before["segments"], after["segments"])
+            except ValueError as e:
+                content = f"<div>Failed to generate comparison: {str(e)}</div>"


Consider more specific error handling

Consider adding more specific error handling for the comparison operation. The current generic ValueError catch could mask underlying issues. Maybe handle specific exceptions that could occur during the comparison.

Code suggestion

Check the AI-generated fix before applying

Suggested change

try:

before = _ensure_profiling_structure(before)

after = _ensure_profiling_structure(after)

content = compare(before["segments"], after["segments"])

except ValueError as e:

content = f"<div>Failed to generate comparison: {str(e)}</div>"

try:

before = _ensure_profiling_structure(before)

after = _ensure_profiling_structure(after)

content = compare(before["segments"], after["segments"])

except KeyError as e:

content = f"<div>Failed to access required data structure: {str(e)}</div>"

except TypeError as e:

content = f"<div>Invalid data type in profiling data: {str(e)}</div>"

except ValueError as e:

content = f"<div>Invalid value in profiling data: {str(e)}</div>"

except Exception as e:

content = f"<div>Unexpected error during comparison: {str(e)}</div>"

Code Review Run #c4efcc

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

flyte-bot · 2025-02-03T06:54:11Z

plugins/flytekit-deck-standard/flytekitplugins/deck/renderer.py

+                try:
+                    profiling_data = pickle.load(f, encoding='bytes')
+                except pickle.UnpicklingError as e:
+                    raise ValueError(f"Failed to deserialize profiling data: {str(e)}")


Consider more specific pickle error handling

Consider adding more specific error handling for different pickle exceptions like pickle.UnpicklingError, EOFError, and AttributeError to provide better error messages for different failure scenarios.

Code suggestion

Check the AI-generated fix before applying

Suggested change

try:

profiling_data = pickle.load(f, encoding='bytes')

except pickle.UnpicklingError as e:

raise ValueError(f"Failed to deserialize profiling data: {str(e)}")

try:

profiling_data = pickle.load(f, encoding='bytes')

except pickle.UnpicklingError as e:

raise ValueError(f"Invalid or corrupted pickle data: {str(e)}")

except EOFError as e:

raise ValueError(f"Pickle file is truncated or empty: {str(e)}")

except AttributeError as e:

raise ValueError(f"Incompatible pickle data format: {str(e)}")

except Exception as e:

raise ValueError(f"Failed to deserialize profiling data: {str(e)}")

Code Review Run #c4efcc

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

flyte-bot · 2025-02-03T06:54:12Z

plugins/flytekit-deck-standard/flytekitplugins/deck/pytorch_memory_viz.py

+        result = subprocess.run(
+            [
+                str(flamegraph_script),
+                "--colors", "python",
+                "--countname", "bytes",
+                "--width", "1200",
+            ],
+            input=flamegraph_lines,
+            capture_output=True,
+            text=True,
+            check=True


Add timeout to subprocess execution

The subprocess.run() call could potentially hang indefinitely. Consider adding a timeout parameter to prevent this.

Code suggestion

Check the AI-generated fix before applying

- result = subprocess.run( - [ - str(flamegraph_script), - "--colors", "python", - "--countname", "bytes", - "--width", "1200", - ], - input=flamegraph_lines, - capture_output=True, - text=True, - check=True - ) + result = subprocess.run( + [ + str(flamegraph_script), + "--colors", "python", + "--countname", "bytes", + "--width", "1200", + ], + input=flamegraph_lines, + capture_output=True, + text=True, + check=True, + timeout=60 + )

Code Review Run #c4efcc

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

flyte-bot · 2025-02-03T06:54:13Z

plugins/flytekit-deck-standard/flytekitplugins/deck/pytorch_memory_viz.py

+        if total_size != seg["total_size"]:
+            raise ValueError(
+                f"Segment size mismatch: {total_size} != {seg['total_size']}"
+            )


Consider more descriptive error message

Consider using a more descriptive error message in the ValueError that includes the segment details for easier debugging.

Code suggestion

Check the AI-generated fix before applying

Suggested change

if total_size != seg["total_size"]:

raise ValueError(

f"Segment size mismatch: {total_size} != {seg['total_size']}"

)

if total_size != seg["total_size"]:

raise ValueError(

f"Segment size mismatch for stream {seg['stream']}: computed size {total_size} != reported size {seg['total_size']}"

)

Code Review Run #c4efcc

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

Future-Outlier

Can we have a demo video about this?
Love this contribution <3

10sharmashivam · 2025-02-04T05:10:45Z

Can we have a demo video about this? Love this contribution <3

Hi @Future-Outlier, Thank you for the positive feedback! 😊

I'll create a demo video showcasing this and share it.

Regards

davidmirror-ops · 2025-02-04T20:47:24Z

I was about to ask the same, a screen recording would be great to learn more how it looks

@10sharmashivam could you check the failing tests?

pingsutw

Awesome! It would be great to add an example and screenshot here as well.

10sharmashivam requested review from Future-Outlier, cosmicBboy, eapolinario, kumare3, pingsutw, samhita-alla, thomasjpfan and wild-endeavor as code owners February 2, 2025 17:07

10sharmashivam added 2 commits February 2, 2025 22:39

Add PyTorch Memory Profiling Deck Renderer

38bb5aa

Signed-off-by: 10sharmashivam <[email protected]>

DCO and signed off

3f8bf6d

Signed-off-by: 10sharmashivam <[email protected]>

10sharmashivam force-pushed the 6123_pytorch_renderer branch from d7f5e51 to 3f8bf6d Compare February 2, 2025 17:11

10sharmashivam mentioned this pull request Feb 2, 2025

[Core feature] Investigate how to turn the PyTorch visualizer into a deck flyteorg/flyte#6123

Open

2 tasks

flyte-bot reviewed Feb 2, 2025

View reviewed changes

Flyte-bot suggested changes applied

29e89b6

Signed-off-by: 10sharmashivam <[email protected]>

flyte-bot reviewed Feb 3, 2025

View reviewed changes

Future-Outlier reviewed Feb 4, 2025

View reviewed changes

pingsutw reviewed Feb 4, 2025

View reviewed changes

		after = ensure_structure(after)
		content = compare(before["segments"], after["segments"])

		with open(file_path, "rb") as f:
		profiling_data = pickle.load(f)

	flamegraph_script = f"/tmp/{os.getuid()}_flamegraph.pl"
	import tempfile
	flamegraph_script = os.path.join(tempfile.gettempdir(), f"{os.getuid()}_flamegraph.pl")

		self._validate_profiling_data(profiling_data)
		self.profiling_data = profiling_data

[Core feature] Add PyTorch Memory Profiling Deck Renderer #3105

Are you sure you want to change the base?

[Core feature] Add PyTorch Memory Profiling Deck Renderer #3105

Uh oh!

Conversation

10sharmashivam commented Feb 2, 2025 • edited by flyte-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tracking issue

Why are the changes needed?

What changes were proposed in this pull request?

How was this patch tested?

Setup process

Screenshots

Check all the applicable boxes

Related PRs

Docs link

Summary by Bito

Uh oh!

flyte-bot commented Feb 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Agent Run #51e09b

Uh oh!

flyte-bot commented Feb 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelist by Bito

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

flyte-bot commented Feb 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Agent Run #64d559

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Feb 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

flyte-bot commented Feb 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Agent Run #c4efcc

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Future-Outlier left a comment

Choose a reason for hiding this comment

Uh oh!

10sharmashivam commented Feb 4, 2025

Uh oh!

davidmirror-ops commented Feb 4, 2025

Uh oh!

pingsutw left a comment

Choose a reason for hiding this comment

Uh oh!

10sharmashivam commented Feb 2, 2025 •

edited by flyte-bot

Loading

flyte-bot commented Feb 2, 2025 •

edited

Loading

flyte-bot commented Feb 2, 2025 •

edited

Loading

flyte-bot commented Feb 2, 2025 •

edited

Loading

codecov bot commented Feb 2, 2025 •

edited

Loading

flyte-bot commented Feb 3, 2025 •

edited

Loading