-
Notifications
You must be signed in to change notification settings - Fork 265
Add the documentation for Sklearn integration in EVADB. #1425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jineetd
wants to merge
4
commits into
georgia-tech-db:staging
Choose a base branch
from
jineetd:sklearn_doc
base: staging
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 2 commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,21 +6,54 @@ Model Training with Sklearn | |
1. Installation | ||
--------------- | ||
|
||
To use the `Sklearn framework <https://scikit-learn.org/stable/>`_, we need to install the extra sklearn dependency in your EvaDB virtual environment. | ||
To use the `Flaml XGBoost AutoML framework <https://microsoft.github.io/FLAML/docs/Examples/Integrate%20-%20Scikit-learn%20Pipeline/>`_, we need to install the extra Flaml dependency in your EvaDB virtual environment. | ||
|
||
.. code-block:: bash | ||
pip install evadb[sklearn] | ||
|
||
pip install "flaml[automl]" | ||
|
||
2. Example Query | ||
---------------- | ||
|
||
.. code-block:: sql | ||
|
||
CREATE OR REPLACE FUNCTION PredictHouseRent FROM | ||
CREATE FUNCTION IF NOT EXISTS PredictRent FROM | ||
( SELECT number_of_rooms, number_of_bathrooms, days_on_market, rental_price FROM HomeRentals ) | ||
TYPE Sklearn | ||
TYPE XGBoost | ||
PREDICT 'rental_price'; | ||
|
||
In the above query, you are creating a new customized function by training a model from the ``HomeRentals`` table using the ``Sklearn`` framework. | ||
The ``rental_price`` column will be the target column for predication, while the rest columns from the ``SELECT`` query are the inputs. | ||
In the above query, you are creating a new customized function by training a model from the ``HomeRentals`` table using the ``Flaml XGBoost`` framework. | ||
The ``rental_price`` column will be the target column for predication, while the rest columns from the ``SELECT`` query are the inputs. | ||
|
||
3. Model Training Parameters | ||
---------------------------- | ||
|
||
.. list-table:: Available Parameters | ||
:widths: 25 75 | ||
|
||
* - PREDICT (**required**) | ||
- The name of the column we wish to predict. | ||
* - MODEL | ||
- The Sklearn models supported as of now are ``Random Forest``, ``Extra Trees Regressor`` and ``KNN``. | ||
You can use ``rf`` for Random Forests, ``extra_tree`` for ExtraTrees Regressor, and ``kneighbor`` for KNN. | ||
* - TIME_LIMIT | ||
- Time limit to train the model in seconds. Default: 120. | ||
* - TASK | ||
- Specify whether you want to perform ``regression`` task or ``classification`` task. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any correlation between TASK and MODEL here? For every model (i.e., random forest, extratrees, KNN), we can choose either regression or classification? |
||
* - METRIC | ||
- Specify the metric that you want to use to train your model. For e.g. for training ``regression`` tasks you could | ||
use the ``r2`` or ``RMSE`` metrics. For training ``classification`` tasks you could use the ``accuracy`` or ``f1_score`` metrics. | ||
More information about the model metrics could be found `here <https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML#optimization-metric>`_ | ||
|
||
Below are the example queries specifying the aboe parameters | ||
|
||
.. code-block:: sql | ||
|
||
CREATE OR REPLACE FUNCTION PredictHouseRentSklearn FROM | ||
( SELECT number_of_rooms, number_of_bathrooms, days_on_market, rental_price FROM HomeRentals ) | ||
TYPE Sklearn | ||
PREDICT 'rental_price' | ||
MODEL 'extra_tree' | ||
METRIC 'r2' | ||
TASK 'regression' | ||
TIME_LIMIT 180; |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.