Skip to content

Separate Rcpp from main logic #57

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

Conversation

jyliuu
Copy link
Collaborator

@jyliuu jyliuu commented Jan 3, 2025

This pull request includes several changes to improve the codebase by refactoring and updating the data structures to not depend on Rcpp. The purpose of the refactoring is to serve as a base for future implementations in Python. The forest classes RandomPlantedForest and ClassificationRPF are now implemented exclusively using standard C++ types. A new interface file is added to connect Rcpp with the existing classes.

Additional changes

  • src/include/random_utils.hpp: Added a new header file defining the RandomGenerator class, which provides a unified interface for random number generation using either the C++ standard library or R's RNG.

  • src/include/rpf.hpp: Added a new RPFParams struct and updated the RandomPlantedForest class to use standard C++ containers. Furthermore, set_parameters was removed as it did not seem to serve any meaningful use case.

  • The method set_data does fit the model, this is a very huge side-effect. Instead fit is no called after set_data

@jyliuu jyliuu force-pushed the separate-rcpp-refactoring branch from 843af94 to c12531c Compare January 3, 2025 15:13
@jyliuu jyliuu force-pushed the separate-rcpp-refactoring branch from c12531c to 3c26adf Compare January 3, 2025 15:16
jyliuu added 5 commits January 3, 2025 16:16
- Removed `set_parameters` method from both RcppRPF and ClassificationRPF classes.
- Updated `get_parameters` to return a structured RPFParams object instead of printing to stdout.
- Removed obsolete test related to parameter setting from test suite.
@jyliuu jyliuu marked this pull request as ready for review January 4, 2025 13:49
@jyliuu jyliuu requested review from jemus42, mnwright and MHiabu January 4, 2025 13:54
Copy link
Collaborator

@jemus42 jemus42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

R CMD check produces this note:

* checking compiled code ... NOTE
File ‘randomPlantedForest/libs/randomPlantedForest.so’:
  Found ‘__ZNSt3__14coutE’, possibly from ‘std::cout’ (C++)
    Objects: ‘lib/cpf.o’, ‘lib/rpf.o’
  Found ‘_rand’, possibly from ‘rand’ (C)
    Object: ‘lib/cpf.o’

Compiled code should not call entry points which might terminate R nor
write to stdout/stderr instead of to the console, nor use Fortran I/O
nor system RNGs nor [v]sprintf.
See ‘Writing portable packages’ in the ‘Writing R Extensions’ manual.

I'm not sure about the _rand bit but for the stdout/stderr bit I guess it doesn't like the use of std::cout, which is why previously Rcout was used.
I'm not sure how to handle that safely while being interface-agnostic, i.e. working for both R and Python interfaces, but I suspect this is going to require an abstraction similar to the RNG part, I guess? Something like a print wrapper that either writes to Rcout or std::cout or something?

@jyliuu
Copy link
Collaborator Author

jyliuu commented Jan 7, 2025

I think that std::cout is used in cpf.cpp for printing out error messages when the parameters given are invalid. We can definitely relegate the validation side to the Rcpp interface

@jemus42
Copy link
Collaborator

jemus42 commented Jan 8, 2025

I'm doing parameter checks explicitly in the R package as well, long before the Rcpp interface sees any data, so as far as simple argument checks are concerned I assume that it might be simpler to do these checks in the high-level interface (R/Python) rather than on the C++ side (regardless of whether Rcpp is involved or not).
From what I gather, many of the existing and partially commented out Rcout calls were for debugging purposes anyway, so maybe it's not a big deal to comment the rest out as well.

If the C++ backend is not intended to be used as a standalone program (which is possible for ranger, for example), then I guess it wouldn't need to write to stdout anyway and just return a meaningful error description to the interface language to be handled there. This would affect things like model fitting errors which can't be checked a priori, but I'm not sure how straightforward it is to handle.

Anyway, I think R CMD check is checking the actual binary for any references to std::cout, so adding a simple "debug" toggle to print messages or not might be insufficient to appease R CMD check.

@mnwright
Copy link
Collaborator

mnwright commented Jan 8, 2025

In ranger, we simply have a std::ostream* verbose_out; which is &Rcpp::Rcout; in Rcpp and std::cout or even a logfile for the pure C++ version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants