Description
Details:
Transformer-based models are better for this problem as they capture the context around lines of code. In general, random forest models do not perform well on high-dimensional data. For sequential data, proposed transformer models work better than existing models, which are better suited for non-sequential data.
The solution:
We propose to enhance the xGitGuard scanner by integrating a BERT model specifically trained for secret detection.
The steps include:
-
Training and building models using BERT:
Develop machine learning models focused on secret detection using BERT architecture. -
Integrating BERT into scanners:
Seamlessly integrate the trained BERT model into the xGitGuard scanner, enhancing its ability to detect sensitive information with higher accuracy.
Alternatives:
Any other pre-trained models like PaLM, Gemini, or any GPT models.
Additional context:
Requires considerable training data.