-
Notifications
You must be signed in to change notification settings - Fork 8
Common Files (backend.common)
This page contains documentation for the files in the /backend/common
directory of the Github repo. This page is regularly updated when new changes are added to these files.
This file stores a number of constants used throughout the backend directory for various purposes.
This file handles the part where the dataset is read in for training. The user can upload a dataset in the following ways (right now):
- As a URL to the raw csv file
- Directly upload the dataset to the website
- Load in a zipped file (For image data at least right now)
-
read_dataset(url)
: Given URL, generate csv file that's locally stored in order to train the deep learning model -
read_local_csv_file(file_path)
: Given file path to csv (from user uploading the file), read it in -
loader_from_zipped(zipped_file, train_transform, test_transform)
: Given a path to zip file, read it in. The zip file structure is explained in "Pretrained Models" section of the page.train_transform
are set of data transformations to apply to the/train
folder in the zip file whiletest_transform
are set of data transformations to apply to the/test
folder in the zip file
In terms of usage, it's pretty straightforward. Simply invoke the function with the proper parameters and you should be all set. However, we don't actually have the developers directly use this file as other endpoints in the backend call functions from this file on the user's behalf.
train_loader, valid_loader = loader_from_zipped(
"../tests/zip_files/double_zipped.zip",
train_transform=[transforms.ToTensor(),
transforms.RandomChoice(transforms=[transforms.ToTensor()])] # do NOT add transforms.Compose. Enter valid transformation sequence
)
Sometimes, the user simply wants to play around with deep learning models without the need to upload a dataset. Our application supplies some default datasets that the user can choose from as an alternative. When the user selects the "default dataset name" from the frontend dropdown (ie: boston housing, california housing, wine, iris, etc), we use sklearn.datasets
to read in the selected default dataset and return a pd.DataFrame
for the dl_trainer
endpoint to use.
get_default_dataset("iris")
Endpoint that takes in email address, subject, body, attachment and sends email notification to user. This file actually does the invoking of our API Gateway endpoint which connects to AWS Lambda + AWS SES. When the user enters an email account in the website and model training happens successfully, our driver function will call this function/route.
Endpoint that contains the compute_loss()
function in order to compute the loss between predicted vs. actual for a given epoch. Measuring train and test loss is critical to see the progression of the model being trained.
LossFunctions(Enum)
is an enum that contains our collection of loss functions. You can add new loss functions as shown in the implementation.
Collection of optimizers that the user can access based on what they specify from the admin website
This file contains getter functions and functions used to generate data visualizations
- This function generates a confusion matrix based on labels and prediction results returned from the last epoch of training
- Confusion matrix are only generated for classification problems
-
train_deep_classification_model()
fromdl_trainer.py
calls this function - This function doesn't return anything, it will save the generated plot as png to a designated directory which will be used to display in the frontend and emailed to the user
- This function generates AUC/ROC curves based on labels and prediction results returned from the last epoch of training
- AUC/ROC curves are only generated for classification problems
- This works for multi-class classification as well, it uses a one-vs-all approach so we will have one curve for each class
-
train_deep_classification_model()
fromdl_trainer.py
calls this function - This function returns raw data for the curves which will be passed to frontend to generate an interactive graph
- The graph is also generated in the backend in addition to the graph generated in the frontend, this is to ensure that the graph can be emailed as png, which needs to be done in the backend
- Home
- Terraform
- Bearer-Token-Gen-Script
- Frontend-Backend Communication Documentation
- Backend Documentation (backend)
-
driver.py
- AWS Helper Files (backend.aws_helpers)
- Dynamo DB Utility Files (aws_helpers.dynamo_db_utils)
- AWS Secrets Utility Files (aws_secrets_utils)
- AWS Batch Utility Files (aws_batch_utils)
- Firebase Helper Files (backend.firebase_helpers)
- Common Files (backend.common)
-
constants.py
-
dataset.py
-
default_datasets.py
-
email_notifier.py
-
loss_functions.py
-
optimizer.py
-
utils.py
- Deep Learning Files (backend.dl)
- Machine Learning Files (backend.ml)
- Frontend Documentation
- Bug Manual
- Developer Runbook
- Examples to locally test DLP
- Knowledge Share