-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core feature] Add PyTorch Memory Profiling Deck Renderer #3105
base: master
Are you sure you want to change the base?
[Core feature] Add PyTorch Memory Profiling Deck Renderer #3105
Conversation
Code Review Agent Run #51e09bActionable Suggestions - 6
Review Details
|
Signed-off-by: 10sharmashivam <[email protected]>
Signed-off-by: 10sharmashivam <[email protected]>
d7f5e51
to
3f8bf6d
Compare
Changelist by BitoThis pull request implements the following key changes.
|
after = ensure_structure(after) | ||
content = compare(before["segments"], after["segments"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compare
plot type implementation could benefit from error handling for malformed data structures. Consider adding validation for the segments
key before accessing it.
Code suggestion
Check the AI-generated fix before applying
after = ensure_structure(after) | |
content = compare(before["segments"], after["segments"]) | |
after = ensure_structure(after) | |
if 'segments' not in before or 'segments' not in after: | |
raise ValueError("Both before and after snapshots must contain 'segments' data") | |
content = compare(before["segments"], after["segments"]) |
Code Review Run #51e09b
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
def ensure_structure(data): | ||
if isinstance(data, dict) and "segments" in data: | ||
return data | ||
return { | ||
"segments": data.get("segments", []) if hasattr(data, "get") else [], | ||
"traces": data.get("traces", []) if hasattr(data, "get") else [], | ||
"allocator_settings": data.get("allocator_settings", {}) if hasattr(data, "get") else {} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ensure_structure
helper function could be moved outside the method scope since it doesn't use any instance variables.
Code suggestion
Check the AI-generated fix before applying
def ensure_structure(data): | |
if isinstance(data, dict) and "segments" in data: | |
return data | |
return { | |
"segments": data.get("segments", []) if hasattr(data, "get") else [], | |
"traces": data.get("traces", []) if hasattr(data, "get") else [], | |
"allocator_settings": data.get("allocator_settings", {}) if hasattr(data, "get") else {} | |
} | |
def ensure_structure(data): | |
if isinstance(data, dict) and "segments" in data: | |
return data | |
return { | |
"segments": data.get("segments", []) if hasattr(data, "get") else [], | |
"traces": data.get("traces", []) if hasattr(data, "get") else [], | |
"allocator_settings": data.get("allocator_settings", {}) if hasattr(data, "get") else {} | |
} |
Code Review Run #51e09b
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
def __init__(self, profiling_data): | ||
if profiling_data is None: | ||
raise ValueError("Profiling data cannot be None") | ||
|
||
# Handle both single snapshot and comparison cases | ||
if isinstance(profiling_data, tuple): | ||
before, after = profiling_data | ||
if before is None or after is None: | ||
raise ValueError("Both before and after snapshots must be provided for comparison") | ||
|
||
# Check if this might be an OOM case by comparing memory usage | ||
try: | ||
self._check_memory_growth(before, after) | ||
except Exception: | ||
# Don't fail initialization if memory check fails | ||
pass | ||
|
||
self.profiling_data = profiling_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The __init__
method could benefit from extracting the profiling data validation logic into a separate method for better organization and reusability.
Code suggestion
Check the AI-generated fix before applying
def __init__(self, profiling_data): | |
if profiling_data is None: | |
raise ValueError("Profiling data cannot be None") | |
# Handle both single snapshot and comparison cases | |
if isinstance(profiling_data, tuple): | |
before, after = profiling_data | |
if before is None or after is None: | |
raise ValueError("Both before and after snapshots must be provided for comparison") | |
# Check if this might be an OOM case by comparing memory usage | |
try: | |
self._check_memory_growth(before, after) | |
except Exception: | |
# Don't fail initialization if memory check fails | |
pass | |
self.profiling_data = profiling_data | |
def _validate_profiling_data(self, data): | |
if data is None: | |
raise ValueError("Profiling data cannot be None") | |
if isinstance(data, tuple): | |
before, after = data | |
if before is None or after is None: | |
raise ValueError("Both before and after snapshots must be provided for comparison") | |
try: | |
self._check_memory_growth(before, after) | |
except Exception: | |
pass | |
return data | |
def __init__(self, profiling_data): | |
self.profiling_data = self._validate_profiling_data(profiling_data) |
Code Review Run #51e09b
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
def to_html(self, plot_type: str = "trace_plot") -> str: | ||
"""Convert profiling data to HTML visualization.""" | ||
|
||
# Define memory_viz_js at the start so it's available for all branches | ||
memory_viz_js = """ | ||
<script src="MemoryViz.js" type="text/javascript"></script> | ||
<script type="text/javascript"> | ||
function init(evt) { | ||
if (window.svgDocument == null) { | ||
svgDocument = evt.target.ownerDocument; | ||
} | ||
} | ||
</script> | ||
""" | ||
|
||
if plot_type == "profile_plot": | ||
import torch | ||
try: | ||
# Create a profile object without initializing it | ||
profile = torch.profiler.profile() | ||
# Set basic attributes needed for visualization | ||
profile.steps = [] | ||
profile.events = [] | ||
profile.key_averages = [] | ||
|
||
# Copy the data from our profiling_data | ||
if isinstance(self.profiling_data, dict): | ||
for key, value in self.profiling_data.items(): | ||
setattr(profile, key, value) | ||
content = profile_plot(profile) | ||
except Exception as e: | ||
content = f"<div>Failed to generate profile plot: {str(e)}</div>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The to_html
method appears to be quite long and handles multiple visualization types. Consider splitting this into separate methods for each visualization type to improve maintainability and readability.
Code suggestion
Check the AI-generated fix before applying
def to_html(self, plot_type: str = "trace_plot") -> str: | |
"""Convert profiling data to HTML visualization.""" | |
# Define memory_viz_js at the start so it's available for all branches | |
memory_viz_js = """ | |
<script src="MemoryViz.js" type="text/javascript"></script> | |
<script type="text/javascript"> | |
function init(evt) { | |
if (window.svgDocument == null) { | |
svgDocument = evt.target.ownerDocument; | |
} | |
} | |
</script> | |
""" | |
if plot_type == "profile_plot": | |
import torch | |
try: | |
# Create a profile object without initializing it | |
profile = torch.profiler.profile() | |
# Set basic attributes needed for visualization | |
profile.steps = [] | |
profile.events = [] | |
profile.key_averages = [] | |
# Copy the data from our profiling_data | |
if isinstance(self.profiling_data, dict): | |
for key, value in self.profiling_data.items(): | |
setattr(profile, key, value) | |
content = profile_plot(profile) | |
except Exception as e: | |
content = f"<div>Failed to generate profile plot: {str(e)}</div>" | |
def _get_memory_viz_js(self) -> str: | |
return """ | |
<script src="MemoryViz.js" type="text/javascript"></script> | |
<script type="text/javascript"> | |
function init(evt) { | |
if (window.svgDocument == null) { | |
svgDocument = evt.target.ownerDocument; | |
} | |
} | |
</script> | |
""" | |
def _render_profile_plot(self) -> str: | |
import torch | |
try: | |
profile = torch.profiler.profile() | |
profile.steps = [] | |
profile.events = [] | |
profile.key_averages = [] | |
if isinstance(self.profiling_data, dict): | |
for key, value in self.profiling_data.items(): | |
setattr(profile, key, value) | |
return profile_plot(profile) | |
except Exception as e: | |
return f"<div>Failed to generate profile plot: {str(e)}</div>" | |
def to_html(self, plot_type: str = "trace_plot") -> str: | |
"""Convert profiling data to HTML visualization.""" | |
memory_viz_js = self._get_memory_viz_js() | |
if plot_type == "profile_plot": | |
content = self._render_profile_plot()" |
Code Review Run #51e09b
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
with open(file_path, "rb") as f: | ||
profiling_data = pickle.load(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use 'Path.open()' instead of built-in 'open()' for file operations and handle pickle safely.
Code suggestion
Check the AI-generated fix before applying
with open(file_path, "rb") as f: | |
profiling_data = pickle.load(f) | |
from pathlib import Path | |
with Path(file_path).open("rb") as f: | |
# TODO: Add pickle safety validation before loading | |
profiling_data = pickle.load(f) |
Code Review Run #51e09b
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
if plot_type in ["memory", "segments"]: | ||
profiling_data = { | ||
"segments": profiling_data.get("segments", []), | ||
"traces": profiling_data.get("traces", []), | ||
"allocator_settings": profiling_data.get("allocator_settings", {}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider extracting the data structure transformation logic into a helper function since it's duplicated in multiple places. The same structure is created again on lines 130-134.
Code suggestion
Check the AI-generated fix before applying
- profiling_data = {
- "segments": profiling_data.get("segments", []),
- "traces": profiling_data.get("traces", []),
- "allocator_settings": profiling_data.get("allocator_settings", {})
- }
+ profiling_data = _transform_profiling_data(profiling_data)
@@ -130,8 +126,13 @@
- after = {
- "segments": profiling_data.get("segments", []) if hasattr(profiling_data, "get") else [],
- "traces": profiling_data.get("traces", []) if hasattr(profiling_data, "get") else [],
- "allocator_settings": profiling_data.get("allocator_settings", {}) if hasattr(profiling_data, "get") else {}
- }
+ after = _transform_profiling_data(profiling_data)
+
+def _transform_profiling_data(data):
+ return {
+ "segments": data.get("segments", []) if hasattr(data, "get") else [],
+ "traces": data.get("traces", []) if hasattr(data, "get") else [],
+ "allocator_settings": data.get("allocator_settings", {}) if hasattr(data, "get") else {}
+ }
Code Review Run #51e09b
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
Code Review Agent Run #64d559Actionable Suggestions - 5
Review Details
|
|
||
print(f"Downloading flamegraph.pl to: {flamegraph_script}") | ||
urllib.request.urlretrieve( | ||
"https://raw.githubusercontent.com/brendangregg/FlameGraph/master/flamegraph.pl", | ||
flamegraph_script, | ||
) | ||
subprocess.check_call(["chmod", "+x", flamegraph_script]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding error handling for the urllib.request.urlretrieve()
call as network requests can fail. Also, the script download could fail due to permission issues when writing to /tmp
.
Code suggestion
Check the AI-generated fix before applying
print(f"Downloading flamegraph.pl to: {flamegraph_script}") | |
urllib.request.urlretrieve( | |
"https://raw.githubusercontent.com/brendangregg/FlameGraph/master/flamegraph.pl", | |
flamegraph_script, | |
) | |
subprocess.check_call(["chmod", "+x", flamegraph_script]) | |
try: | |
print(f"Downloading flamegraph.pl to: {flamegraph_script}") | |
urllib.request.urlretrieve( | |
"https://raw.githubusercontent.com/brendangregg/FlameGraph/master/flamegraph.pl", | |
flamegraph_script, | |
) | |
subprocess.check_call(["chmod", "+x", flamegraph_script]) | |
except (urllib.error.URLError, subprocess.CalledProcessError, IOError) as e: | |
raise RuntimeError(f"Failed to download or setup flamegraph script: {str(e)}") |
Code Review Run #64d559
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
|
||
def format_flamegraph(flamegraph_lines, flamegraph_script=None): | ||
if flamegraph_script is None: | ||
flamegraph_script = f"/tmp/{os.getuid()}_flamegraph.pl" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using tempfile.gettempdir()
instead of hardcoding /tmp
for better cross-platform compatibility. The current implementation may fail on Windows systems.
Code suggestion
Check the AI-generated fix before applying
flamegraph_script = f"/tmp/{os.getuid()}_flamegraph.pl" | |
import tempfile | |
flamegraph_script = os.path.join(tempfile.gettempdir(), f"{os.getuid()}_flamegraph.pl") |
Code Review Run #64d559
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
p = subprocess.Popen( | ||
args, stdin=subprocess.PIPE, stdout=subprocess.PIPE, encoding="utf-8" | ||
) | ||
assert p.stdin is not None | ||
assert p.stdout is not None | ||
p.stdin.write(flamegraph_lines) | ||
p.stdin.close() | ||
result = p.stdout.read() | ||
p.stdout.close() | ||
p.wait() | ||
assert p.wait() == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The subprocess handling could be simplified using subprocess.run()
with check=True
instead of manual pipe handling and assertions
Code suggestion
Check the AI-generated fix before applying
p = subprocess.Popen( | |
args, stdin=subprocess.PIPE, stdout=subprocess.PIPE, encoding="utf-8" | |
) | |
assert p.stdin is not None | |
assert p.stdout is not None | |
p.stdin.write(flamegraph_lines) | |
p.stdin.close() | |
result = p.stdout.read() | |
p.stdout.close() | |
p.wait() | |
assert p.wait() == 0 | |
result = subprocess.run( | |
args, input=flamegraph_lines, text=True, capture_output=True, check=True | |
).stdout |
Code Review Run #64d559
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
stream = "" if seg["stream"] == 0 else f', stream_{seg["stream"]}' | ||
body = "".join(occupied) | ||
assert ( | ||
seg_free_external + seg_free_internal + seg_allocated == seg["total_size"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable stream
is defined twice with similar logic but slightly different formatting. Consider consolidating into a single definition.
Code suggestion
Check the AI-generated fix before applying
stream = "" if seg["stream"] == 0 else f', stream_{seg["stream"]}' | |
body = "".join(occupied) | |
assert ( | |
seg_free_external + seg_free_internal + seg_allocated == seg["total_size"] | |
body = "".join(occupied) | |
assert ( | |
seg_free_external + seg_free_internal + seg_allocated == seg["total_size"] |
Code Review Run #64d559
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
invalid_file_path = os.path.join(CURRENT_DIR, "invalid.pkl") | ||
with open(invalid_file_path, "w") as f: | ||
f.write("not a pickle file") | ||
|
||
try: | ||
invalid_file = FlyteFile(invalid_file_path) | ||
with pytest.raises(ValueError, match="Failed to load profiling data"): | ||
render_pytorch_profiling(invalid_file) | ||
finally: | ||
# Clean up the temporary file | ||
if os.path.exists(invalid_file_path): | ||
os.remove(invalid_file_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using a context manager pattern with tempfile
module for handling temporary files. This ensures proper cleanup even if exceptions occur.
Code suggestion
Check the AI-generated fix before applying
invalid_file_path = os.path.join(CURRENT_DIR, "invalid.pkl") | |
with open(invalid_file_path, "w") as f: | |
f.write("not a pickle file") | |
try: | |
invalid_file = FlyteFile(invalid_file_path) | |
with pytest.raises(ValueError, match="Failed to load profiling data"): | |
render_pytorch_profiling(invalid_file) | |
finally: | |
# Clean up the temporary file | |
if os.path.exists(invalid_file_path): | |
os.remove(invalid_file_path) | |
with tempfile.NamedTemporaryFile(suffix='.pkl', mode='w') as temp_file: | |
temp_file.write("not a pickle file") | |
temp_file.flush() | |
invalid_file = FlyteFile(temp_file.name) | |
with pytest.raises(ValueError, match="Failed to load profiling data"): |
Code Review Run #64d559
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #3105 +/- ##
==========================================
- Coverage 80.12% 76.16% -3.97%
==========================================
Files 272 202 -70
Lines 24614 21472 -3142
Branches 2768 2768
==========================================
- Hits 19723 16354 -3369
- Misses 4083 4304 +221
- Partials 808 814 +6 ☔ View full report in Codecov by Sentry. |
Signed-off-by: 10sharmashivam <[email protected]>
Code Review Agent Run #c4efccActionable Suggestions - 5
Review Details
|
self._validate_profiling_data(profiling_data) | ||
self.profiling_data = profiling_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider handling potential exceptions from _validate_profiling_data()
in __init__()
. Currently, if validation fails, the instance variable self.profiling_data
might remain uninitialized.
Code suggestion
Check the AI-generated fix before applying
self._validate_profiling_data(profiling_data) | |
self.profiling_data = profiling_data | |
self.profiling_data = profiling_data | |
try: | |
self._validate_profiling_data(profiling_data) | |
except ValueError: | |
self.profiling_data = None |
Code Review Run #c4efcc
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
try: | ||
before = _ensure_profiling_structure(before) | ||
after = _ensure_profiling_structure(after) | ||
content = compare(before["segments"], after["segments"]) | ||
except ValueError as e: | ||
content = f"<div>Failed to generate comparison: {str(e)}</div>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding more specific error handling for the comparison operation. The current generic ValueError
catch could mask underlying issues. Maybe handle specific exceptions that could occur during the comparison.
Code suggestion
Check the AI-generated fix before applying
try: | |
before = _ensure_profiling_structure(before) | |
after = _ensure_profiling_structure(after) | |
content = compare(before["segments"], after["segments"]) | |
except ValueError as e: | |
content = f"<div>Failed to generate comparison: {str(e)}</div>" | |
try: | |
before = _ensure_profiling_structure(before) | |
after = _ensure_profiling_structure(after) | |
content = compare(before["segments"], after["segments"]) | |
except KeyError as e: | |
content = f"<div>Failed to access required data structure: {str(e)}</div>" | |
except TypeError as e: | |
content = f"<div>Invalid data type in profiling data: {str(e)}</div>" | |
except ValueError as e: | |
content = f"<div>Invalid value in profiling data: {str(e)}</div>" | |
except Exception as e: | |
content = f"<div>Unexpected error during comparison: {str(e)}</div>" |
Code Review Run #c4efcc
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
try: | ||
profiling_data = pickle.load(f, encoding='bytes') | ||
except pickle.UnpicklingError as e: | ||
raise ValueError(f"Failed to deserialize profiling data: {str(e)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding more specific error handling for different pickle exceptions like pickle.UnpicklingError
, EOFError
, and AttributeError
to provide better error messages for different failure scenarios.
Code suggestion
Check the AI-generated fix before applying
try: | |
profiling_data = pickle.load(f, encoding='bytes') | |
except pickle.UnpicklingError as e: | |
raise ValueError(f"Failed to deserialize profiling data: {str(e)}") | |
try: | |
profiling_data = pickle.load(f, encoding='bytes') | |
except pickle.UnpicklingError as e: | |
raise ValueError(f"Invalid or corrupted pickle data: {str(e)}") | |
except EOFError as e: | |
raise ValueError(f"Pickle file is truncated or empty: {str(e)}") | |
except AttributeError as e: | |
raise ValueError(f"Incompatible pickle data format: {str(e)}") | |
except Exception as e: | |
raise ValueError(f"Failed to deserialize profiling data: {str(e)}") |
Code Review Run #c4efcc
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
result = subprocess.run( | ||
[ | ||
str(flamegraph_script), | ||
"--colors", "python", | ||
"--countname", "bytes", | ||
"--width", "1200", | ||
], | ||
input=flamegraph_lines, | ||
capture_output=True, | ||
text=True, | ||
check=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The subprocess.run()
call could potentially hang indefinitely. Consider adding a timeout parameter to prevent this.
Code suggestion
Check the AI-generated fix before applying
- result = subprocess.run(
- [
- str(flamegraph_script),
- "--colors", "python",
- "--countname", "bytes",
- "--width", "1200",
- ],
- input=flamegraph_lines,
- capture_output=True,
- text=True,
- check=True
- )
+ result = subprocess.run(
+ [
+ str(flamegraph_script),
+ "--colors", "python",
+ "--countname", "bytes",
+ "--width", "1200",
+ ],
+ input=flamegraph_lines,
+ capture_output=True,
+ text=True,
+ check=True,
+ timeout=60
+ )
Code Review Run #c4efcc
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
if total_size != seg["total_size"]: | ||
raise ValueError( | ||
f"Segment size mismatch: {total_size} != {seg['total_size']}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using a more descriptive error message in the ValueError
that includes the segment details for easier debugging.
Code suggestion
Check the AI-generated fix before applying
if total_size != seg["total_size"]: | |
raise ValueError( | |
f"Segment size mismatch: {total_size} != {seg['total_size']}" | |
) | |
if total_size != seg["total_size"]: | |
raise ValueError( | |
f"Segment size mismatch for stream {seg['stream']}: computed size {total_size} != reported size {seg['total_size']}" | |
) |
Code Review Run #c4efcc
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a demo video about this?
Love this contribution <3
Hi @Future-Outlier, Thank you for the positive feedback! 😊 I'll create a demo video showcasing this and share it. Regards |
I was about to ask the same, a screen recording would be great to learn more how it looks @10sharmashivam could you check the failing tests? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! It would be great to add an example and screenshot here as well.
Tracking issue
Reference Issue
Why are the changes needed?
Huggingface announced a PyTorch memory visualizer that could be valuable for debugging memory-related issues in Flyte tasks, especially for failed executions. This implementation provides an interactive way to visualize PyTorch profiling data directly in Flyte decks, making it easier to diagnose memory issues.
What changes were proposed in this pull request?
PyTorchProfilingRenderer
class for visualizing PyTorch profiling dataHow was this patch tested?
Setup process
Screenshots
Check all the applicable boxes
Related PRs
Docs link
Summary by Bito
This PR introduces and enhances the PyTorch Memory Profiling Deck Renderer that enables visualization of memory usage in Flyte tasks. The implementation provides interactive visualizations including memory usage timeline, segment analysis, profile visualization, and snapshot comparisons. The enhancements include comprehensive error handling for pickle deserialization, improved validation of profiling data structures, and robust subprocess execution controls with better error messages and proper temporary file handling.Unit tests added: True
Estimated effort to review (1-5, lower is better): 4