Class names in identifiers in rendered output for aggregated frames corresponding to method calls can be misleading

We hit an issue when using `pyinstrument` to regularly profile a large epidemiological simulation code that the rendered profile output seemed to be grouping method calls incorrectly:

UCL/TLOmodel-profiling/issues/19

Specifically we were finding calls to functions (/ methods) being grouped under class methods which we thought shouldn't ever be able to call these functions. 

I've looked in to this a bit and think I've identified what is going on. The key issue seems to be that the class name used in the identifier for a (aggregated) frame corresponding to a method call will be written out as the name of particular subclass (specifically the mostly commonly observed subclass in the aggregated captured frames) even if the method being called is defined on a base class.

As a minimal reproducer, consider the following script corresponding to a very simplified version of our simulation call stack:

```Python

import time


def simulate():
    events = [EventA(), EventB(), EventA()] * 10
    for event in events:
        event.run()


class Event:
    
    def run(self):
        self.apply()


class EventA(Event):
    
    def apply(self):
        time.sleep(0.1)


class EventB(Event):
    
    def apply(self):
        time.sleep(0.1)
    

if __name__ == "__main__":
    simulate()
```

Running `pyinstrument` on this script (saved as `pyinstrument_test.py`) gives the following output using the console renderer:

```
  _     ._   __/__   _ _  _  _ _/_   Recorded: 16:12:19  Samples:  31
 /_//_/// /_\ / //_// / //_'/ //     Duration: 3.015     CPU time: 0.007
/   _/                      v4.7.3

Program: pyinstrument_test.py

3.013 <module>  pyinstrument_test.py:1
└─ 3.013 simulate  pyinstrument_test.py:4
   └─ 3.013 EventA.run  pyinstrument_test.py:12
      ├─ 2.009 EventA.apply  pyinstrument_test.py:18
      │  └─ 2.009 sleep  <built-in>
      └─ 1.005 EventB.apply  pyinstrument_test.py:24
         └─ 1.005 sleep  <built-in>
```

Here we see that underneath the top-level `simulate` call, all event `apply` method calls are grouped under a frame with identifier `EventA.run`. This corresponds in reality to the `run` method defined on the base `Event` class, which we can see from the correct line number reference in to the script. In this case it is fairly easy to spot that there is no `run` method defined directly in `EventA` and so that this in reality corresponds to `Event`, but in our real use case the call stack is a lot more complex and spotting when method calls actually correspond to calls to a base class method is more difficult.

My understanding of how this arises, is that as the concrete classes from which the `run` method is called are `EventA` and `EventB`, when the corresponding frames are subsequently aggregated `pyinstrument` resolves the ambiguity of which class name to use when rendering by outputting the most commonly observed class name based on the logic in:

https://github.com/joerick/pyinstrument/blob/4b37f8cdc531be41a7f7e57932f0b770244025d5/pyinstrument/frame.py#L308-L309

https://github.com/joerick/pyinstrument/blob/4b37f8cdc531be41a7f7e57932f0b770244025d5/pyinstrument/frame.py#L284-L305


Ideally instead of the above output, for our purposes it would be useful to instead get

```
  _     ._   __/__   _ _  _  _ _/_   Recorded: 16:12:19  Samples:  31
 /_//_/// /_\ / //_// / //_'/ //     Duration: 3.015     CPU time: 0.007
/   _/                      v4.7.3

Program: pyinstrument_test.py

3.013 <module>  pyinstrument_test.py:1
└─ 3.013 simulate  pyinstrument_test.py:4
   └─ 3.013 Event.run  pyinstrument_test.py:12
      ├─ 2.009 EventA.apply  pyinstrument_test.py:18
      │  └─ 2.009 sleep  <built-in>
      └─ 1.005 EventB.apply  pyinstrument_test.py:24
         └─ 1.005 sleep  <built-in>
```

that is the class name qualifying the `run` method identifier being specified as the base class `Event` it is defined in. 

I am not clear however how feasible it would be to output the class name where methods are defined like this, and whether the current behaviour is intentional?

@joerick if you do think it would feasible and desirable to change the behaviour of `pyinstrument` to instead produce something more like the latter output, I'd be happy to try to work on a PR to try to implement this!


	def get_attribute_value(self, attribute_marker: str) -> str \| None:
	"""
	Returns the value of the attribute. If multiple values are present,
	the most commonly observed one is returned.
	"""
	# Attributes are recorded as a dict, with the key representing an
	# observation, and the value representing the duration that it was
	# observed. the first character of the observation is the 'marker' -
	# the type of the attribute, the rest is data.
	matching_attributes = [
	a_tuple
	for a_tuple in self.attributes.items()
	if a_tuple[0].startswith(attribute_marker)
	]

	if len(matching_attributes) == 0:
	return None

	top_attribute, _ = max(matching_attributes, key=lambda a: a[1])

	# strip off the marker, return the data
	return top_attribute[1:]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Class names in identifiers in rendered output for aggregated frames corresponding to method calls can be misleading #340

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	def class_name(self) -> str \| None:
	return self.get_attribute_value(ATTRIBUTE_MARKER_CLASS_NAME)

Class names in identifiers in rendered output for aggregated frames corresponding to method calls can be misleading #340

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions