Skip to content

Identify bad html coding practice and give its correct version

Notifications You must be signed in to change notification settings

shaheen234/HTML-bad-coding-practices

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HTML Bad coding practices

Overview

This document provides a brief explanation of the parameters, training process, and testing phases of a code framework utilizing a T5 model for text correction and generation.

Parameters

  • model: T5 model instance for training.
  • tokenizer: Tokenizer associated with the T5 model.
  • train_dataset: Training dataset (instance of HtmlDataset).
  • epochs: Number of training epochs (default is 3).
  • learning_rate: 0.001.

Training

Training is performed using a custom train function that:

  • Loads batches of data.
  • Tokenizes inputs and labels.
  • Computes gradients and updates model parameters.

The function prints the average loss for each epoch and saves the fine-tuned model at the end of training.

Components Used

  1. Custom Components:

    • HTMLAwareTokenizer: Custom HTML-aware tokenizer.
    • HtmlDataset: Custom dataset class for handling input and label data.
    • preprocess_html: Custom function for preprocessing HTML text.
    • tokenize_html_aware: Custom function for HTML-aware tokenization.
  2. Predefined Components:

    • T5ForConditionalGeneration and T5Tokenizer: Classes from the Hugging Face Transformers library for T5 model and tokenizer.
    • torch.device: PyTorch class for specifying the device (CPU or GPU).
    • torch.optim.Adam: PyTorch class for the Adam optimizer.
    • torch.utils.data.DataLoader: PyTorch class for efficient data loading.
  3. Training and Optimization:

    • Utilizes PyTorch's training loop for gradient computation, backpropagation, and model parameter updates.

Testing

For testing, a custom function generate_correction is used. This function generates a corrected version of the input text using the fine-tuned T5 model and tokenizer parameters.


Note: Ensure all required dependencies and datasets are available before running the training or testing scripts.

About

Identify bad html coding practice and give its correct version

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published