Skip to content

Conversation

@Hampuztt
Copy link

@Hampuztt Hampuztt commented Sep 3, 2025

Last year I used this package for my bachelor thesis on comparing supervised vs. unsupervised SOMs. When I set supervised_iterations=0 the results were unexpectedly poor. For the thesis I worked around it with my own majority-voting implementation (based on your paper), which produced much better accuracy.

On the final day I, by chance, found the bug in the library but didn’t have time to open a PR. Submitting the fix now.

In SOMClassifier.py, _init_super_som has this loop:

With an loop that contains this

if dp_in_node != []:
    y_in_node = self.y_.flatten()[dp_in_node]
    if not any(y_in_node == self.missing_label_placeholder):
        node_class = np.argmax(
            np.unique(y_in_node, return_counts=True)[1]  # bug
        )

This assigns node_class to the index of the max count, not the label.

Correct fix:

labels, counts = np.unique(y_in_node, return_counts=True)
node_class = labels[np.argmax(counts)]

Now the node is classified by the most frequent label, not its index.

BEFORE

with_bug

AFTER

after_fix

Small disclaimer

As this was quite some time ago, I haven't fully 100% verified that this was the bug that I originally found. But I'm pretty sure it is. The graphs are generated from the code below, it's a modified slice of the code I used for my bachelor thesis work so it's not the clearest example.

Code

from sklearn.metrics import confusion_matrix, classification_report, accuracy_score  # type: ignore
import matplotlib.pyplot as plt  # type: ignore
from sklearn.model_selection import train_test_split  # type: ignore
from sklearn.preprocessing import MinMaxScaler  # type: ignore
from sklearn import datasets  # type: ignore
from sklearn.utils import Bunch  # type: ignore
import susi  # type: ignore
from susi.SOMPlots import plot_nbh_dist_weight_matrix, plot_umatrix  # type: ignore


def compareAccuracies(
    n_rows: int,
    n_cols: int,
    train_iterations: int,
    dataset: Bunch,
    comparisons: int,
    random_state=10,
):
    scaler = MinMaxScaler()
    majority_voting_accuracies = []
    supervised_som_accuracies = []

    for i in range(comparisons):
        print(f"Iteration: {i + 1}")
        X_train, X_test, y_train, y_test = train_test_split(
            dataset.data,
            dataset.target,
            test_size=0.7,
            random_state=random_state + i,
        )
        print(
            f"x_train: {X_train.shape}, x_test: {X_test.shape}, "
            f"y_train: {y_train.shape}, y_test: {y_test.shape}"
        )
        X_train = scaler.fit_transform(X_train)
        X_test = scaler.transform(X_test)

        # Unsupervised-only classifier (majority voting)
        unsup_som = susi.SOMClassifier(
            n_rows=n_rows,
            n_columns=n_cols,
            n_iter_unsupervised=train_iterations,
            n_iter_supervised=0,
            random_state=random_state + i,
        )
        unsup_som.fit(X_train, y_train)
        unsup_y_pred = unsup_som.predict(X_test)
        majority_voting_accuracies.append(accuracy_score(y_test, unsup_y_pred))

        # Supervised classifier
        sup_som = susi.SOMClassifier(
            n_rows=n_rows,
            n_columns=n_cols,
            n_iter_unsupervised=train_iterations,
            n_iter_supervised=train_iterations,
            random_state=random_state + i,
        )
        sup_som.fit(X_train, y_train)
        sup_y_pred = sup_som.predict(X_test)
        supervised_som_accuracies.append(accuracy_score(y_test, sup_y_pred))

    draw_accuracies(
        majority_voting_accuracies,
        supervised_som_accuracies,
        dataset["filename"],
    )


def draw_accuracies(
    majority_voting_accuracies: list[int],
    supervised_som_accuracies: list[int],
    dataset_name="",
):
    if dataset_name:
        dataset_name = dataset_name.split(".")[0].upper()
    runs = range(1, len(supervised_som_accuracies) + 1)

    avg_supervised_accuracy = sum(supervised_som_accuracies) / len(
        supervised_som_accuracies
    )
    avg_majority_voting_accuracy = sum(majority_voting_accuracies) / len(
        majority_voting_accuracies
    )

    # First, plot the supervised SOM accuracies
    # Plot supervised and majority voting accuracies
    plt.figure(figsize=(12, 5))
    plt.subplot(1, 1, 1)  # First subplot in a 1x2 grid
    plt.plot(
        runs,
        supervised_som_accuracies,
        label="Supervised SOM",
        color="blue",
        marker="o",
    )
    plt.plot(
        runs,
        majority_voting_accuracies,
        label="Majority Voting",
        color="orange",
        marker="x",
    )
    plt.axhline(
        y=avg_supervised_accuracy,
        color="green",
        linewidth=2,
        linestyle=":",
        label=f"Avg Supervised SOM Accuracy ({100 * avg_supervised_accuracy:.2f}%)",
    )
    plt.axhline(
        y=avg_majority_voting_accuracy,
        color="red",
        linewidth=2,
        linestyle=":",
        label=f"Avg Majority Voting Accuracy ({100 * avg_majority_voting_accuracy:.2f}%)",
    )

    plt.xlabel("Run")
    plt.ylabel("Accuracy (%)")
    plt.title(f"Accuracies: Supervised SOM and Majority Voting ({dataset_name})")
    plt.legend()

    plt.savefig("Images/" + "Accuracies_" + dataset_name, bbox_inches="tight")
    plt.show()


if __name__ == "__main__":
    iris_data = datasets.load_iris()
    datasets = [iris_data]
    map_sizes = [(10, 5)]
    iterations = [10000]

    for data in datasets:
        for n_cols, n_rows in map_sizes:
            for iter in iterations:
                compareAccuracies(n_rows, n_rows, iter, data, 4)

@Hampuztt
Copy link
Author

Hampuztt commented Sep 6, 2025

@felixriese

@felixriese
Copy link
Owner

Hi @Hampuztt , thanks for the pull request and for the whole explanation. I will have a look at it in the next days. In the meantime, can you please do the following:

  1. apply black susi (correct version see test-requirements.txt
  2. write a test for it in tests/test_SOMClassifier.py in order to make sure that this bug actually exits and is fixed with your bugfix (meaning: it needs to fail without your fix and needs to run successfully with your fix)

Best, Felix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants