fix: assign majority class label instead of index in majority voting #58

Hampuztt · 2025-09-03T13:25:17Z

Last year I used this package for my bachelor thesis on comparing supervised vs. unsupervised SOMs. When I set supervised_iterations=0 the results were unexpectedly poor. For the thesis I worked around it with my own majority-voting implementation (based on your paper), which produced much better accuracy.

On the final day I, by chance, found the bug in the library but didn’t have time to open a PR. Submitting the fix now.

In SOMClassifier.py, _init_super_som has this loop:

With an loop that contains this

if dp_in_node != []:
    y_in_node = self.y_.flatten()[dp_in_node]
    if not any(y_in_node == self.missing_label_placeholder):
        node_class = np.argmax(
            np.unique(y_in_node, return_counts=True)[1]  # bug
        )

This assigns node_class to the index of the max count, not the label.

Correct fix:

labels, counts = np.unique(y_in_node, return_counts=True)
node_class = labels[np.argmax(counts)]

Now the node is classified by the most frequent label, not its index.

BEFORE

AFTER

Small disclaimer

As this was quite some time ago, I haven't fully 100% verified that this was the bug that I originally found. But I'm pretty sure it is. The graphs are generated from the code below, it's a modified slice of the code I used for my bachelor thesis work so it's not the clearest example.

Code

from sklearn.metrics import confusion_matrix, classification_report, accuracy_score  # type: ignore
import matplotlib.pyplot as plt  # type: ignore
from sklearn.model_selection import train_test_split  # type: ignore
from sklearn.preprocessing import MinMaxScaler  # type: ignore
from sklearn import datasets  # type: ignore
from sklearn.utils import Bunch  # type: ignore
import susi  # type: ignore
from susi.SOMPlots import plot_nbh_dist_weight_matrix, plot_umatrix  # type: ignore


def compareAccuracies(
    n_rows: int,
    n_cols: int,
    train_iterations: int,
    dataset: Bunch,
    comparisons: int,
    random_state=10,
):
    scaler = MinMaxScaler()
    majority_voting_accuracies = []
    supervised_som_accuracies = []

    for i in range(comparisons):
        print(f"Iteration: {i + 1}")
        X_train, X_test, y_train, y_test = train_test_split(
            dataset.data,
            dataset.target,
            test_size=0.7,
            random_state=random_state + i,
        )
        print(
            f"x_train: {X_train.shape}, x_test: {X_test.shape}, "
            f"y_train: {y_train.shape}, y_test: {y_test.shape}"
        )
        X_train = scaler.fit_transform(X_train)
        X_test = scaler.transform(X_test)

        # Unsupervised-only classifier (majority voting)
        unsup_som = susi.SOMClassifier(
            n_rows=n_rows,
            n_columns=n_cols,
            n_iter_unsupervised=train_iterations,
            n_iter_supervised=0,
            random_state=random_state + i,
        )
        unsup_som.fit(X_train, y_train)
        unsup_y_pred = unsup_som.predict(X_test)
        majority_voting_accuracies.append(accuracy_score(y_test, unsup_y_pred))

        # Supervised classifier
        sup_som = susi.SOMClassifier(
            n_rows=n_rows,
            n_columns=n_cols,
            n_iter_unsupervised=train_iterations,
            n_iter_supervised=train_iterations,
            random_state=random_state + i,
        )
        sup_som.fit(X_train, y_train)
        sup_y_pred = sup_som.predict(X_test)
        supervised_som_accuracies.append(accuracy_score(y_test, sup_y_pred))

    draw_accuracies(
        majority_voting_accuracies,
        supervised_som_accuracies,
        dataset["filename"],
    )


def draw_accuracies(
    majority_voting_accuracies: list[int],
    supervised_som_accuracies: list[int],
    dataset_name="",
):
    if dataset_name:
        dataset_name = dataset_name.split(".")[0].upper()
    runs = range(1, len(supervised_som_accuracies) + 1)

    avg_supervised_accuracy = sum(supervised_som_accuracies) / len(
        supervised_som_accuracies
    )
    avg_majority_voting_accuracy = sum(majority_voting_accuracies) / len(
        majority_voting_accuracies
    )

    # First, plot the supervised SOM accuracies
    # Plot supervised and majority voting accuracies
    plt.figure(figsize=(12, 5))
    plt.subplot(1, 1, 1)  # First subplot in a 1x2 grid
    plt.plot(
        runs,
        supervised_som_accuracies,
        label="Supervised SOM",
        color="blue",
        marker="o",
    )
    plt.plot(
        runs,
        majority_voting_accuracies,
        label="Majority Voting",
        color="orange",
        marker="x",
    )
    plt.axhline(
        y=avg_supervised_accuracy,
        color="green",
        linewidth=2,
        linestyle=":",
        label=f"Avg Supervised SOM Accuracy ({100 * avg_supervised_accuracy:.2f}%)",
    )
    plt.axhline(
        y=avg_majority_voting_accuracy,
        color="red",
        linewidth=2,
        linestyle=":",
        label=f"Avg Majority Voting Accuracy ({100 * avg_majority_voting_accuracy:.2f}%)",
    )

    plt.xlabel("Run")
    plt.ylabel("Accuracy (%)")
    plt.title(f"Accuracies: Supervised SOM and Majority Voting ({dataset_name})")
    plt.legend()

    plt.savefig("Images/" + "Accuracies_" + dataset_name, bbox_inches="tight")
    plt.show()


if __name__ == "__main__":
    iris_data = datasets.load_iris()
    datasets = [iris_data]
    map_sizes = [(10, 5)]
    iterations = [10000]

    for data in datasets:
        for n_cols, n_rows in map_sizes:
            for iter in iterations:
                compareAccuracies(n_rows, n_rows, iter, data, 4)

Hampuztt · 2025-09-06T10:55:27Z

@felixriese

felixriese · 2025-09-22T20:13:00Z

Hi @Hampuztt , thanks for the pull request and for the whole explanation. I will have a look at it in the next days. In the meantime, can you please do the following:

apply black susi (correct version see test-requirements.txt
write a test for it in tests/test_SOMClassifier.py in order to make sure that this bug actually exits and is fixed with your bugfix (meaning: it needs to fail without your fix and needs to run successfully with your fix)

Best, Felix

fix: assign majority class label instead of index in majority voting

0ea4880

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: assign majority class label instead of index in majority voting #58

fix: assign majority class label instead of index in majority voting #58

Uh oh!

Hampuztt commented Sep 3, 2025

Uh oh!

Hampuztt commented Sep 6, 2025

Uh oh!

felixriese commented Sep 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: assign majority class label instead of index in majority voting #58

Are you sure you want to change the base?

fix: assign majority class label instead of index in majority voting #58

Uh oh!

Conversation

Hampuztt commented Sep 3, 2025

BEFORE

AFTER

Small disclaimer

Code

Uh oh!

Hampuztt commented Sep 6, 2025

@felixriese

Uh oh!

felixriese commented Sep 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants