Provide a brief introduction to your project, its purpose, and its relevance. Mention how it relates to the concepts introduced in the paper "Attention is All You Need" by Vaswani et al.
- Introduction
- Background
- Architecture Overview
- Implementation Details
- Usage
- Results
- Conclusion
- References
Explain the motivation behind your project and the problem you are addressing. Mention the significance of the Transformer model and its relevance in natural language processing and other domains.
Provide a detailed explanation of the Transformer model as described in the paper "Attention is All You Need". Cover the following points:
- Self-Attention Mechanism: Explain how self-attention works and its advantages over traditional recurrent and convolutional networks.
- Multi-Head Attention: Describe the concept of multi-head attention and its role in enhancing the model's ability to focus on different aspects of the input sequence.
- Positional Encoding: Discuss how positional encoding is used to inject positional information into the input sequences.
- Encoder: Explain the structure of the encoder, including the stacking of encoder layers, multi-head attention, and feed-forward networks.
- Decoder: Detail the structure of the decoder, focusing on the decoder layers, multi-head attention (both self-attention and cross-attention), and feed-forward networks.
- Input Embeddings: Discuss how input embeddings are utilized for both source and target sequences.
- Positional Encoding: Explain the method used for positional encoding in your implementation.
- Loss Function: Cross Entropy Loss was used with
ignore_index
andsmoothing
.
loss_fn = nn.CrossEntropyLoss(
ignore_index=tokenizer_src.token_to_id("[PAD]"), label_smoothing=0.1
)
- Optimizer: Adam was used with learining rate 10**-4.
optimizer = torch.optim.Adam(
self.model.parameters(), lr=self.config["lr"], eps=1e-9
)
- Learning Rate Schedule: No learning rate scheduler was used.
The model was trained using the findnitai/english-to-hinglish
dataset available on Hugging Face. This dataset consists of English sentences paired with their Hinglish translations, facilitating the training of a translation model between these languages.
The model architecture is based on the Transformer model as described in the paper "Attention is All You Need" by Vaswani et al. The implementation leverages PyTorch and PyTorch Lightning for efficient training and experimentation. The model was trained using a batch size of 10, for 20 epochs, with a learning rate of 0.0001, and a sequence length of 300 tokens.
During training, the model's performance was evaluated using metrics such as BLEU score, Word Error Rate (WER), and Character Error Rate (CER) to assess translation quality.
- Quantitative Results: Provide metrics such as BLEU score, Word Error Rate, or any other relevant evaluation metrics.
- Qualitative Results: Include sample outputs or predictions generated by your model.
- Paper: "Attention is All You Need" by Vaswani et al., 2017.
- Dataset:
findnitai/english-to-hinglish
dataset on Hugging Face.