Skip to content

Convert single column binary predictions to two #375

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

tdhock
Copy link
Contributor

@tdhock tdhock commented Apr 3, 2025

Closes #374

@tdhock
Copy link
Contributor Author

tdhock commented Apr 3, 2025

probably should add a test case

N_data <- length(predict_tensor)
vec_dim <- c(N_data, 1)
pos_scores <- predict_tensor$reshape(vec_dim)
neg_scores <- torch::torch_zeros(N_data)$reshape(vec_dim)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for justification of using zero for negative scores before putting them through softmax, see my slides, section "Multi-class classification" https://github.com/tdhock/2023-08-deep-learning/blob/main/slides/torch-part1/06-backprop.pdf
image

@sebffischer
Copy link
Member

I need to think a bit about this to be sure how to handle this nicely (e.g. via nn("head") handling everything as suggested in our other discussion). Thanks a lot for stress-testing the package and uncovering some shortcomings! :)

@sebffischer
Copy link
Member

sebffischer commented Apr 3, 2025

Also when we introduce this, I think it should be done consistently across classification learners.
Currently also the pretrained learners (such as lrn("classif.resnet18")) also output a tensor of shape (batch, n_classes), even when n_classes == 2. This should then also be changed across the board.

The loss of a LearnerTorch (such as lrn("classif.mlp", loss = t_loss("cross-entropy") does not yet know which type of task it will be applied to during construction. I would like to avoid having an extra loss for binary cross-entropy and multi-class cross-entropy, as this is a bit annoying for benchmark experiments where a classification learner is sometimes applied to binary classification problems and other times to multi-class classification problems such as in:

library(mlr3torch)
design = benchmark_grid( 
  # sonar is binary, iris multi-class
  tasks = tsks(c("sonar", "iris")),
  learners = lrn("classif.mlp", epochs = 10, batch_size = 16, loss = t_loss("cross_entropy")),
  resampling = rsmp("cv")
)
benchmark(design)

I think the solution is to make t_loss("cross_entropy") generate a nn_bce_loss_with_logits when it encounters a binary classification problem and otherwise a nn_cross_entropy_loss.
Also, maybe the information on how to load the targets should be attached to the loss function, which currently is not the case. Maybe even the prediction encoder should be attached to the loss. I am not completely sure here what is the best way to design this that is easy to customize but also works out of the box in the standard scenario.

@tdhock
Copy link
Contributor Author

tdhock commented Apr 3, 2025

agree about "make t_loss("cross_entropy") generate a nn_bce_loss_with_logits when it encounters a binary classification problem and otherwise a nn_cross_entropy_loss."

not sure I understand "load the targets should be attached to the loss function, which currently is not the case. Maybe even the prediction encoder should be attached to the loss"

@sebffischer
Copy link
Member

sebffischer commented Apr 3, 2025

not sure I understand "load the targets should be attached to the loss function, which currently is not the case. Maybe even the prediction encoder should be attached to the loss"

When mlr3torch trains the model, the factor column from the task gets converted into a tensor. how this is done depends on the loss function as we have seen in this example. Because of this relationship I was thinking maybe the loss function should dictate how to load the target (0-1 or 1-2, int vs float).

The prediction encoder defines how a torch prediction tensor as output by the underlying network is converted to an mlr3::Prediction object. This is what you are fixing here. I am wondering from where we should get the information how to do this. I think a runtime check on the dimension as done here is a little hacky so ideally I would like to do this differently (and solve it more generically).

@tdhock
Copy link
Contributor Author

tdhock commented Apr 3, 2025

agree with "a runtime check on the dimension as done here is a little hacky so ideally I would like to do this differently (and solve it more generically)."

@sebffischer
Copy link
Member

Superseded by #385

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

predict_type="prob" does not work with out_features=1
2 participants