HTML Bad coding practices

Overview

This document provides a brief explanation of the parameters, training process, and testing phases of a code framework utilizing a T5 model for text correction and generation.

Parameters

model: T5 model instance for training.
tokenizer: Tokenizer associated with the T5 model.
train_dataset: Training dataset (instance of HtmlDataset).
epochs: Number of training epochs (default is 3).
learning_rate: 0.001.

Training

Training is performed using a custom train function that:

Loads batches of data.
Tokenizes inputs and labels.
Computes gradients and updates model parameters.

The function prints the average loss for each epoch and saves the fine-tuned model at the end of training.

Components Used

Custom Components:
- HTMLAwareTokenizer: Custom HTML-aware tokenizer.
- HtmlDataset: Custom dataset class for handling input and label data.
- preprocess_html: Custom function for preprocessing HTML text.
- tokenize_html_aware: Custom function for HTML-aware tokenization.
Predefined Components:
- T5ForConditionalGeneration and T5Tokenizer: Classes from the Hugging Face Transformers library for T5 model and tokenizer.
- torch.device: PyTorch class for specifying the device (CPU or GPU).
- torch.optim.Adam: PyTorch class for the Adam optimizer.
- torch.utils.data.DataLoader: PyTorch class for efficient data loading.
Training and Optimization:
- Utilizes PyTorch's training loop for gradient computation, backpropagation, and model parameter updates.

Testing

For testing, a custom function generate_correction is used. This function generates a corrected version of the input text using the fine-tuned T5 model and tokenizer parameters.

Note: Ensure all required dependencies and datasets are available before running the training or testing scripts.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitattributes		.gitattributes
HTML_Good_Code_Practice_ipynb.ipynb		HTML_Good_Code_Practice_ipynb.ipynb
README.md		README.md
ReadMe.pdf		ReadMe.pdf
data.csv		data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HTML Bad coding practices

Overview

Parameters

Training

Components Used

Testing

About

Releases

Packages

Languages

shaheen234/HTML-bad-coding-practices

Folders and files

Latest commit

History

Repository files navigation

HTML Bad coding practices

Overview

Parameters

Training

Components Used

Testing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages