Everyone is welcome to contribute! We try to make this repository as valuable resources to the research community and industry.
There are four ways to contribute:
- Submitting pull requests for bugs fixing
- Adding new dataset for pre-training and pre-trained models
- Contributing new examples of how to use IndoBERT
- Proposing new tasks on top of the current benchmark!
Please send us a pull request or issue, and we will come back to you as soon as possible.
We want more people to receive the benefits of this repository. You are welcome to evaluate your pre-trained models on our NLU tasks and share the model's link in our repository. We recommend you to submit your model to the Huggingface repository and send us the model's link! It would be great also to provide the code to run the models to reproduce the results. Please create an issue to let us know.
Currently, we have collected almost 4 billion words as the largest-ever Indonesian resources so far. We have a long-term plan to expand our collected pre-training data to build large models. Please send us your suggestions by creating an issue.
We are currently open to any suggestions to add new tasks to our benchmark. We plan to build a more extensive NLG benchmark covering more tasks. Please create an issue or send us an email.