Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controlling zorder per group in scatterplot #3830

Open
bmcfee opened this issue Mar 7, 2025 · 2 comments
Open

Controlling zorder per group in scatterplot #3830

bmcfee opened this issue Mar 7, 2025 · 2 comments

Comments

@bmcfee
Copy link

bmcfee commented Mar 7, 2025

I'm running into a situation where it would be nice to be able to set or override the zorder for different groups within a scatterplot, and it doesn't seem like there's an obvious way to do this (short of overlaying multiple scatterplots by hand and then adjusting all the elements after).

For example, I have a scatterplot with three distinct groups (call them A, B, and C), which I'm currently mapping to hue. It so happens that groups B and C are tightly clustered, while group A is both larger and more dispersed. As a result, the A group tends to visually crowd out the B and C groups. I can fudge this with transparency, but it's not a terribly satisfying solution.

If it was possible to set the zorder for each group, it would be easy to have the B and C groups draw above the A group and prevent crowding without relying on transparency. I tried to do this in a postprocessing step, but it appears that all scatterplot points are put into one collection object, so there's no direct way to update the zorder by group.

For right now, I can hack around this by sorting the dataframe prior to calling scatterplot() to produce my preferred draw order. I don't think this is guaranteed behavior though, so it doesn't strike me as a stable or recommended approach.

I expect that exposing zorder here might entail quite a bit of complexity under the hood - it seems like matplotlib pathcollections only allow zorder at the collection level, not individual elements. That probably means that the collection would need to be broken into multiple collections, and if zorder is mapped to a field with high cardinality (or continuous values) that could get unwieldy.

Still, it seems like it could be useful to provide some way to influence the draw order, so I figured I'd raise the issue here.

@mwaskom
Copy link
Owner

mwaskom commented Mar 7, 2025

scatterplot is actually a little different from most seaborn functions in the way it handles the semantic mapping. Because the underlying matplotlib artist allows us to control the properties of the individual points, there's a single PathCollection drawn per axes, instead of doing a group by and drawing one per level. As a consequence, the zorder of individual points will reflect the index of the point in the dataframe (this reduces overplotting artifacts that you'd otherwise get with a grouped approach). But the zorder property applies to the whole PathCollection not to individual points, so I don't actually believe it's possible to expose any control beyond sorting the input dataframe in particular ways — probably best left to the caller at that point.

@bmcfee
Copy link
Author

bmcfee commented Mar 10, 2025

As a consequence, the zorder of individual points will reflect the index of the point in the dataframe (this reduces overplotting artifacts that you'd otherwise get with a grouped approach).
so I don't actually believe it's possible to expose any control beyond sorting the input dataframe in particular ways — probably best left to the caller at that point.

Exactly - but this is the part that makes me a little uncomfortable, as I believe this is undocumented behavior and may be subject to change. (I could imagine this getting very weird in some settings, eg dask dataframes, but I'm sure that's out of scope to consider anyway. 😁)

In general, it seems like the index order of records within a dataframe usually doesn't matter for how the resulting figure appears, though obviously it has to in some situations.

I'm totally fine with closing this out as "won't implement", but a couple of thoughts come to mind if it could be potentially in scope at some point:

  • I see your point about consolidating to a single path collection. I could imagine a solution that can determine at runtime whether to use one or many collections though, based on whether an explicit zorder (or possibly other attributes) are tied to data. I don't really see any functional downside to this as it would be opt-in behavior and otherwise the current behavior is preserved. (API bloat could be a real downside, but I could imagine zorder control being generally useful enough to consider adding. Obviously it's your call.)
  • I'm sure the matplotlib folks have enough on their plate already, but ambiguous behavior depending on data order does seem like the kind of thing they might care about codifying. Maybe even exposing path-level zorder like they do color, in which case a lot of this could be simple to implement on the seaborn side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants