-
Notifications
You must be signed in to change notification settings - Fork 349
Comparison of Automatic Modeling Effects of Ymodel, Weka, Rapidminer
Objective: To compare the automatic modeling effects of Weka, Rapidminer, and Ymodel
Data to be used: 5 pieces of data in total, 3 pieces of classification, and 2 pieces of regression
2 classic Kaggle cases and 3 real business data
Ymodel, Weka, and Rapidmine Studio (hereinafter collectively referred to as Rapidminer) are three products that do well in automatic model-building. In the article Comparison of Automatic Modeling Effects of Ymodel, Weka, Rapidminer we tested the modeling effects of the three; in this article, we’ll compare them from the perspective of user experience.
Product overview
YModel is a software specifically designed for automatic modeling, providing a fully automated modeling experience.
Weka and Rapidminer are software that integrates manual and automatic modeling. The comparative description in this article is only for the automodeling part.
Installation
Weka: AutoWEKA is an extension package for Weka. You need to first install Weka and then install it yourself in the package manager. During the installation process, you may encounter issues such as inability to connect or installation failure, and you need to solve them yourself.
YModel: A specialized automated modeling software that can be installed directly.
Rapidminer: Automatic modeling is an important functional module of Rapidminer, which can be directly installed.
**Difficulty in Getting Started **
YModel is a pure automatic modeling tool with the simplest operation. After clicking on the modeling function on the main interface, a dialog box will pop up step by step to guide users in importing data, setting character formats, configuring target variables, etc. It is very smooth to use and almost does not require looking through documents..
Rapidminer is in the second place, with slightly more functions, but the automatic modeling part is not difficult to use, and users can quickly become familiar with it with the help of documents.
Weka's main interface provides several different operation methods, such as menu-based operation, workflowbased operation, and command line operation. If you are a beginner, you need to spend some time learning. Automatic modeling is not an independent function in Weka, but a special model, so the operation and manual modeling parts are mixed together, and many operations require users to find them.
Modeling automation level
All the three software can achieve fully automated data preprocessing and modeling processes, which can be used even by non-professionals. The modeling effect is also good, and the performance on different datasets varies. Please refer to the last article the article Comparison of Automatic Modeling Effects of Ymodel, Weka, Rapidminer. In terms of the automation level of the modeling process, Ymodel and Rapidminer are better than AutoWeka.
For example, there may be some data types that AutoWeka cannot recognize and need to be manually processed in advance in Weka. Ymodel and Rapidminer did quite well, and there were no unrecognized data types in the several sets of data tested.
To compare between Ymodel and Rapidminer, each has its own characteristics.
Ymodel is a specialized automation modeling software that maximizes the automation of the entire process. For example, when importing data, YModel will automatically analyze variables, calculate many statistical indicators, generate variable distribution charts, calculate correlations with target variable, and deselectuseless variables. Basically, the commonly used indicators for analyzing variables have all been calculated.
Rapidminer only conducts basic analysis of variable quality when importing data, helping users deselectuseless variables and missing more statistical analysis indicators.
In the recognition of target variable types, for the common forms of 0,1, Ymodel can recognize it as a classification model, while Rapidminder defaults to identifying it as a numerical value and requires users to manually switch it to classification. Similarly, when using numbers to represent classifications in variables, Ymodel can recognize them as classifications, while Rapidminer only recognizes them based on data types and requires users to manually modify them.
After the model is built, Ymodel will automatically output the optimal model directly, while Rapiminer will suggest the optimal models for users to choose which one to save.
Moreover, Ymodel does a good job in details, such as recording the character format, time format, missing value format, and other information configured when importing modeling data. When predicting, it is automatically configured without the need for repeated configuration on the predicted data.
Display of model results
For the display of model results, Rapidminer does a relatively abundant job. A model file contains a lot of contents, such as model structure, model performance, and the workflowgenerated by the model.
In Ymodel, the selected optimal model, its parameters, and model performance can be seen in the model results.
Weka's automatic modeling result presentation is relatively simple, with only the most basic descriptions.
Functions
AutoWeka: Classification, Regression
YModel: Classification, Regression, Time Series
Rapidminer: Classification, Regression, Clustering, Outlier detection
Another highlight of Rapidminer is that its automatic modeling process is transparent, allowing users to make modifications and optimizations in the workflow, resulting in higher flexibility.
API
YModel can be integrated into users' own systems such as Java, Python, C #, etc.
Rapidminer provides REST API and Python API
Weka provides REST API and Java API
Overall Experience
Weka | YModel | Rapidminer | |
---|---|---|---|
Easy to install | ★★ | ★★★ | ★★★ |
Easy to get started | ★ | ★★★ | ★★ |
Modeling automation level | ★ | ★★★ | ★★ |
Display of model results | ★ | ★★ | ★★★ |
Functions | ★ | ★★ | ★★★ |
API | ★★★ | ★★★ | ★★★ |
Titanic Data | Classification | Kaggle |
---|---|---|
House Price Prediction | Regression | Kaggle |
Credit Company User Overdue Prediction | Classification | |
Claims prediction of insurance company policies | Classification | |
Second-hand car transaction price prediction | Regression |
Due to the limited data size of Rapidminer's free version of 10000 items, three real business data was sampled, with sample sizes controlled within a few thousand items. It is not possible to conduct large data volume testing.
**Product introduction: **Weka is open source, and the automatic modeling function is an extension module of Weka, which is free to use. Rapidminer is a commercial software. Although it has a free version, the auto model function will be charged.
**Overall user experience:**Ymodel has the fastest modeling speed. Rapidminer is relatively fast in model building, and when there are many variables, the modeling time increases significantly. Weka modeling requires setting the modeling time beforehand, and the modeling speed is also relatively slow. In Weka, sometimes it is necessary to manually handle some variable types in order to be recognized by automatic modeling. In terms of automatic modeling functionality, Weka's experience is relatively poor.
**Testing method:**All data is divided into a training set and a prediction set, and the prediction results are exported and scored uniformly.
Test results:
- Titanic Survival Prediction - Classification
Training data: 802 items, 12 variables
The ratio of positive and negative samples is approximately 3:5
Weka | Rapidminer | Ymodel | |
---|---|---|---|
Accuracy | 0.722 | 0.787 | 0.775 |
Precision | 0.862 | 0.809 | 0.857 |
Recall | 0.556 | 0.756 | 0.667 |
Specificity | 0.909 | 0.818 | 0.886 |
F1 | 0.676 | 0.782 | 0.75 |
AUC | 0.793 | 0.847 | |
Ranking | 3 | 2 | 1 |
It is unable to output probability values in Weka (or possibly not finding how to output), therefore unable to calculate AUC.
- House Price Prediction - Regression
Weka | Rapidminer | Ymodel | |
---|---|---|---|
Mse | 4.17E8 | 1.41E9 | 9.85E8 |
Rmse | 20430 | 37539 | 31385 |
Mae | 14164 | 19459 | 16378 |
Mape | 9.108 | 11.317 | 9.921 |
R2 | 0.889 | 0.755 | 0.829 |
Ranking | 1 | 3 | 2 |
- Credit Company User Overdue Prediction - Classification
Training data: 8938 items, 56 variables
The ratio of positive and negative samples is approximately 1:8
Weka | Rapidminer | Ymodel | |
---|---|---|---|
Accuracy | 0.878 | 0.880 | 0.804 |
Precision | - | 0.471 | 0.281 |
Recall | 0 | 0.063 | 0.409 |
Specificity | 1 | 0.99 | 0.858 |
F1 | - | 0.111 | 0.333 |
AUC | 0.729 | 0.742 | |
Ranking | 3 | 2 | 1 |
On this data, the Weka model failed and did not capture any positive sample.
- Claims prediction of insurance company policies - classification
Training data: 3470 items, 29 variables
The ratio of positive and negative samples is approximately 1:7
Weka | Rapidminer | Ymodel | |
---|---|---|---|
Accuracy | 0.905 | 0.949 | 0.882 |
Precision | 0.051 | 0.033 | 0.022 |
Recall | 0.264 | 0.069 | 0.139 |
Specificity | 0.916 | 0.965 | 0.895 |
F1 | 0.086 | 0.045 | 0.038 |
AUC | 0.642 | 0.638 | |
Ranking | 1 | 2 | 3 |
- Second-hand car transaction price prediction
Weka | Rapidminer | Ymodel | |
---|---|---|---|
Mse | 2779927 | 8466716 | 9429967 |
Rmse | 1667 | 2910 | 3070 |
Mae | 835 | 1580 | 1537 |
Mape | 27 | 75 | 54 |
R2 | 0.941 | 0.821 | 0.801 |
Ranking | 1 | 2 | 3 |
**Overall evaluation:**Among the 5 data samples used in this testing, the rankings vary depending on the data, but the difference in indexes is not significant, and the overall performance of Ymodel is quite good. In comparison, Weka performs well in regression model, Ymodel performs well in classification model, and Rapidminer is in the middle.
SPL Resource: SPL Official Website | SPL Blog | Download esProc SPL | SPL Source Code