Contains the notebooks Prompting.ipynb RAG_new.ipynb used for creating datasets.
The test sets are test_metal_hydride_250_gemma2_9B.xlsx and test_metal_hydride_250_llama3_8B.xlsx.
Final dataset is final_extracted_1611.xlsx
A subset of the generated dataset using this framework showing hydrogen storage properties of various alloys and compounds
extracted from the paper’s abstracts.
The paper is available here link.
Open RAG_new.ipynb in Google Colab.
Connect to T4 GPU.
After installing the libraries open the terminal.
#open the terminal
!pip install colab-xterm
%load_ext colabxterm
%xtermRun the following commands in the terminal to load the LLM.
curl -fsSL https://ollama.com/install.sh | sh
ollama serve & ollama pull llama3
ollama serve & ollama run llama3Load the file containing the abstracts.
selected_df = pd.read_csv("selected_df.xls")Define the query.
query = """
Describe all the parameters of the material discussed in the text.
If no information is available just write "N/A".
The output should be concise and in the format as below:
Name of Alloy :
Hydrogen storage capacity :
Temperature :
Pressure :
Experimental Conditions :
"""Follow the notebook to extract the data.
Open Prompting.ipynb in Google Colab
Load the LLM.
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/llama-3-8b-bnb-4bit",
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)Use this prompt or modify it.
text_instruct = """
Describe all the parameters of the material discussed in the text.
If no information is available just write "N/A".
The output should be concise and in the format as below:
Name of Alloy :
Hydrogen storage capacity :
Temperature :
Pressure :
Experimental Conditions :
"""Load the file containing the abstracts.
selected_df = pd.read_csv("selected_df.xls")Follow the notebook to extract the specified data.
We used some of the alloys extracted from the dataset to see whether they can be used for ML models.
For this purpose we used HYST Paper
The predictions of the models are provided in Dataset_testing_HYST.csv
