-
-
Couldn't load subscription status.
- Fork 371
Description
Details
Hi there,
Consider the following data.csv file:
Class_Name Grade
Turma 6 4.2
Turma 4 3.5
Turma 6 0.2
Turma 3 Especial 1.6
Turma 2 Piloto 7.8
Turma 4 1.4
Turma 5 1.6
Turma 6 3.8000000000000003
Turma 6 1.5
Turma 6 5.800000000000001
Turma 6 7.8
Turma 2 Piloto 3.3
Turma 2 Piloto 0.8
Turma 3 Especial 0.0
Turma 6 8.9
Turma 4 3.0
Turma 2 Piloto 5.0
Turma 6 1.1
Turma 5 5.2
Turma 6 4.2
Turma 6 7.1
Turma 2 Piloto 5.7
Turma 2 Piloto 0.8
Turma 5 4.0
Turma 6 3.5999999999999996
Turma 6 0.1
Turma 6 3.8000000000000003
Turma 1 3.3
Turma 6 4.0
Turma 2 Piloto 1.6
Turma 4 8.5
Turma 3 Especial 0.9
Turma 6 2.5
Turma 1 3.5
Turma 4 4.1
Turma 4 0.8
Turma 6 2.2
Turma 2 Piloto 1.7000000000000002
Turma 5 2.4
Turma 6 3.6
Turma 6 3.0
Turma 5 0.8
Turma 1 2.2
Turma 2 Piloto 1.6
Turma 4 1.6
Turma 5 2.1
Turma 3 Especial 1.7000000000000002
Turma 6 8.2
Turma 5 2.6
Turma 6 3.4000000000000004
Turma 4 2.7
Turma 6 4.800000000000001
Turma 2 Piloto 3.0999999999999996
Turma 5 2.9
Turma 6 3.5
Turma 5 1.8
Turma 6 1.6
Turma 1 8.4
Turma 2 Piloto 4.4
Turma 1 1.9000000000000001
Turma 6 2.5
Turma 6 0.0
Turma 6 4.9
Turma 4 3.6
Turma 6 3.9
Turma 4 0.8
Turma 6 1.2000000000000002
Turma 6 3.0
Turma 6 6.3
Turma 2 Piloto 5.4
Turma 3 Especial 0.0
Turma 6 1.6
Turma 1 1.7
Turma 5 2.1
Turma 1 5.4
Turma 2 Piloto 1.6
I tried to call the function groupedhist, from the package StatsPlots, grouped by the column :Class_Name, and normalized so that I could compare the grades of the students in the distinct classes, which have a different number of students. To that end, I thought the parameter normalize would be appropriate (as suggested in the help for the function histogram), since it would seem to ensure the total area for each group (in the corresponding bins) would sum to unity. After having read the csv file to a dataframe df_aux, I then ran:
using StatsPlots, DataFramesMeta
group_hist = @with df_aux groupedhist(:Grade, group=:Class_Name,
title="Histograms", bins=11, xticks=0:1:10, normalize=:true)
To my surprise, the resulting output plot is given by
grouped_hist
which sure is weird: visually we notice that, for instance, the area of the red bars (corresponding to “Turma 2 Piloto”) is manifestly less than the area of the lighter blue bars (corresponding to “Turma 6”)! Could it be that, for the groupedhist function the parameter normalize is incorrectly implemented, if at all? It seems the height of the bars look like the counts in the bins, despite the numeric labels along the vertical axis, which suggest some normalization…
In fact, what I expected was, for instance, what I was able to obtain from Python's Seaborn histplot command:
sns.histplot(df, x="P1", bins=range(11), hue="Nome_Turma", multiple="dodge", stat="density", common_norm=False)
In fact, it seems to me that the parameter ǹormalizeof StatsPlots' groupedhistis not doing what I would expect. At any rate assigning it the symbol :true does not correspond to Seaborn's common_norm=False, which normalizes the histogram of eachhue(the equivalent of StatsPlotsgroup`) such that its total area is unity; that's, in my opinion, what should be given, like an approximation for the true probability density function of the histogram
Backends
This bug occurs on ( insert x below )
| Backend | yes | no | untested |
|---|---|---|---|
| gr (default) | x | ||
| unicodeplots | x | ||
| pythonplot | x | ||
| pgfplotsx | x | ||
| plotlyjs | x | ||
| plotly | x | ||
| gaston | x |
Versions
Plots.jl version: 1.41.1
Backend version (]st -m <backend(s)>):
Output of versioninfo():
Julia Version 1.12.0
Commit b907bd0600f (2025-10-07 15:42 UTC)
Build Info:
Official https://julialang.org release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 20 × Intel(R) Core(TM) i5-14500
WORD_SIZE: 64
LLVM: libLLVM-18.1.7 (ORCJIT, alderlake)
GC: Built with stock GC
Threads: 1 default, 1 interactive, 1 GC (on 20 virtual cores)
Environment:
JULIA_EDITOR = emacs
