Support for Apache Spark-Based Preprocessing in RecBole for Large-Scale Datasets? #2176
Unanswered
JamorMoussa
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I'm working on a recommendation project in the retail domain and recently discovered RecBole—an interesting library that makes it easy to switch between different models and provides a standardized way to represent data. However, I’ve encountered an issue: RecBole relies on Pandas for data loading and preprocessing, which becomes a bottleneck when working with large datasets.
Does RecBole offer any interface or support for using Apache Spark during the data processing stage, followed by efficient data loading (e.g., using PyTorch
DataLoader
) for training models?If not, what would be the recommended approach to preprocess data using Spark and still integrate with RecBole components like its models and trainers?
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions