Comparison of Automatic Modeling Effects of Ymodel, Weka, Rapidminer

Objective: To compare the automatic modeling effects of Weka, Rapidminer, and Ymodel

Data to be used: 5 pieces of data in total, 3 pieces of classification, and 2 pieces of regression

2 classic Kaggle cases and 3 real business data

Ymodel, Weka, and Rapidmine Studio (hereinafter collectively referred to as Rapidminer) are three products that do well in automatic model-building. In the article Comparison of Automatic Modeling Effects of Ymodel, Weka, Rapidminer we tested the modeling effects of the three; in this article, we’ll compare them from the perspective of user experience.

Product overview

YModel is a software specifically designed for automatic modeling, providing a fully automated modeling experience.

Weka and Rapidminer are software that integrates manual and automatic modeling. The comparative description in this article is only for the automodeling part.

Installation

Weka: AutoWEKA is an extension package for Weka. You need to first install Weka and then install it yourself in the package manager. During the installation process, you may encounter issues such as inability to connect or installation failure, and you need to solve them yourself.

YModel: A specialized automated modeling software that can be installed directly.

Rapidminer: Automatic modeling is an important functional module of Rapidminer, which can be directly installed.

**Difficulty in Getting Started **

YModel is a pure automatic modeling tool with the simplest operation. After clicking on the modeling function on the main interface, a dialog box will pop up step by step to guide users in importing data, setting character formats, configuring target variables, etc. It is very smooth to use and almost does not require looking through documents..

Rapidminer is in the second place, with slightly more functions, but the automatic modeling part is not difficult to use, and users can quickly become familiar with it with the help of documents.

Weka's main interface provides several different operation methods, such as menu-based operation, workflowbased operation, and command line operation. If you are a beginner, you need to spend some time learning. Automatic modeling is not an independent function in Weka, but a special model, so the operation and manual modeling parts are mixed together, and many operations require users to find them.

Modeling automation level

All the three software can achieve fully automated data preprocessing and modeling processes, which can be used even by non-professionals. The modeling effect is also good, and the performance on different datasets varies. Please refer to the last article the article Comparison of Automatic Modeling Effects of Ymodel, Weka, Rapidminer. In terms of the automation level of the modeling process, Ymodel and Rapidminer are better than AutoWeka.

For example, there may be some data types that AutoWeka cannot recognize and need to be manually processed in advance in Weka. Ymodel and Rapidminer did quite well, and there were no unrecognized data types in the several sets of data tested.

To compare between Ymodel and Rapidminer, each has its own characteristics.

Ymodel is a specialized automation modeling software that maximizes the automation of the entire process. For example, when importing data, YModel will automatically analyze variables, calculate many statistical indicators, generate variable distribution charts, calculate correlations with target variable, and deselectuseless variables. Basically, the commonly used indicators for analyzing variables have all been calculated.

Rapidminer only conducts basic analysis of variable quality when importing data, helping users deselectuseless variables and missing more statistical analysis indicators.

In the recognition of target variable types, for the common forms of 0,1, Ymodel can recognize it as a classification model, while Rapidminder defaults to identifying it as a numerical value and requires users to manually switch it to classification. Similarly, when using numbers to represent classifications in variables, Ymodel can recognize them as classifications, while Rapidminer only recognizes them based on data types and requires users to manually modify them.

After the model is built, Ymodel will automatically output the optimal model directly, while Rapiminer will suggest the optimal models for users to choose which one to save.

Moreover, Ymodel does a good job in details, such as recording the character format, time format, missing value format, and other information configured when importing modeling data. When predicting, it is automatically configured without the need for repeated configuration on the predicted data.

Display of model results

For the display of model results, Rapidminer does a relatively abundant job. A model file contains a lot of contents, such as model structure, model performance, and the workflowgenerated by the model.

In Ymodel, the selected optimal model, its parameters, and model performance can be seen in the model results.

Weka's automatic modeling result presentation is relatively simple, with only the most basic descriptions.

Functions

AutoWeka: Classification, Regression

YModel: Classification, Regression, Time Series

Rapidminer: Classification, Regression, Clustering, Outlier detection

Another highlight of Rapidminer is that its automatic modeling process is transparent, allowing users to make modifications and optimizations in the workflow, resulting in higher flexibility.

API

YModel can be integrated into users' own systems such as Java, Python, C #, etc.

Rapidminer provides REST API and Python API

Weka provides REST API and Java API

Overall Experience

	Weka	YModel	Rapidminer
Easy to install	★★	★★★	★★★
Easy to get started	★	★★★	★★
Modeling automation level	★	★★★	★★
Display of model results	★	★★	★★★
Functions	★	★★	★★★
API	★★★	★★★	★★★

Titanic Data	Classification	Kaggle
House Price Prediction	Regression	Kaggle
Credit Company User Overdue Prediction	Classification
Claims prediction of insurance company policies	Classification
Second-hand car transaction price prediction	Regression

Due to the limited data size of Rapidminer's free version of 10000 items, three real business data was sampled, with sample sizes controlled within a few thousand items. It is not possible to conduct large data volume testing.

**Product introduction: **Weka is open source, and the automatic modeling function is an extension module of Weka, which is free to use. Rapidminer is a commercial software. Although it has a free version, the auto model function will be charged.

**Overall user experience:**Ymodel has the fastest modeling speed. Rapidminer is relatively fast in model building, and when there are many variables, the modeling time increases significantly. Weka modeling requires setting the modeling time beforehand, and the modeling speed is also relatively slow. In Weka, sometimes it is necessary to manually handle some variable types in order to be recognized by automatic modeling. In terms of automatic modeling functionality, Weka's experience is relatively poor.

**Testing method:**All data is divided into a training set and a prediction set, and the prediction results are exported and scored uniformly.

Test results:

Titanic Survival Prediction - Classification

Training data: 802 items, 12 variables

The ratio of positive and negative samples is approximately 3:5

	Weka	Rapidminer	Ymodel
Accuracy	0.722	0.787	0.775
Precision	0.862	0.809	0.857
Recall	0.556	0.756	0.667
Specificity	0.909	0.818	0.886
F1	0.676	0.782	0.75
AUC		0.793	0.847
Ranking	3	2	1

It is unable to output probability values in Weka (or possibly not finding how to output), therefore unable to calculate AUC.

House Price Prediction - Regression

	Weka	Rapidminer	Ymodel
Mse	4.17E8	1.41E9	9.85E8
Rmse	20430	37539	31385
Mae	14164	19459	16378
Mape	9.108	11.317	9.921
R2	0.889	0.755	0.829
Ranking	1	3	2

Credit Company User Overdue Prediction - Classification

Training data: 8938 items, 56 variables

The ratio of positive and negative samples is approximately 1:8

	Weka	Rapidminer	Ymodel
Accuracy	0.878	0.880	0.804
Precision	-	0.471	0.281
Recall	0	0.063	0.409
Specificity	1	0.99	0.858
F1	-	0.111	0.333
AUC		0.729	0.742
Ranking	3	2	1

On this data, the Weka model failed and did not capture any positive sample.

Claims prediction of insurance company policies - classification

Training data: 3470 items, 29 variables

The ratio of positive and negative samples is approximately 1:7

	Weka	Rapidminer	Ymodel
Accuracy	0.905	0.949	0.882
Precision	0.051	0.033	0.022
Recall	0.264	0.069	0.139
Specificity	0.916	0.965	0.895
F1	0.086	0.045	0.038
AUC		0.642	0.638
Ranking	1	2	3

Second-hand car transaction price prediction

	Weka	Rapidminer	Ymodel
Mse	2779927	8466716	9429967
Rmse	1667	2910	3070
Mae	835	1580	1537
Mape	27	75	54
R2	0.941	0.821	0.801
Ranking	1	2	3

**Overall evaluation:**Among the 5 data samples used in this testing, the rankings vary depending on the data, but the difference in indexes is not significant, and the overall performance of Ymodel is quite good. In comparison, Weka performs well in regression model, Ymodel performs well in classification model, and Rapidminer is in the middle.

SPL Resource： SPL Official Website | SPL Blog | Download esProc SPL | SPL Source Code

Comparison of Automatic Modeling Effects of Ymodel, Weka, Rapidminer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!