Skip to content

Building in ALT-REP to stringi #474

@traversc

Description

@traversc

Have you considered implementing an ALT-REP string class? I think done properly, you'd see a large increase in performance across the board. There are many reasons why:

  • Simpler data structures compared to R's heavy CHARSXP and R's global string cache
  • Short string optimization
  • The possibility of true multithreading (you can't multithread R internals)

If there's interest, I'd be happy to develop and work on it.

To flesh it out a bit, I think you could use an ALT-REP class that's represented by standard STL structures:

std::vector<std::string>

You don't need to keep track of encoding, if you can assume UTF-8.

You'd probably want some global configuration parameter:

stri_use_alt_rep(bool)

You'd have to replace every interaction with R string memory with a conditional.

CHAR
SET_STRING_ELT
STRING_ELT
mkChar*
Rf_allocVector(STRSXP,...)

And replace any comparison of address for testing string equality (not sure if stringi does so).

There are probably things I'm forgetting and it's a lot of work, but I think clearly defined.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions