Skip to content

Commit 73e418a

Browse files
authored
Merge pull request #598 from aakankshaduggal/llama-prompt-configs
Add pipelines and configs for llama as a teacher model
2 parents 79c6047 + ef9cd39 commit 73e418a

File tree

7 files changed

+371
-0
lines changed

7 files changed

+371
-0
lines changed
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
system: You are an AI assistant knowledgeable about {{domain}} domain. Be accurate but concise in response.
2+
3+
introduction: |
4+
Please break down the following snippet from an article about {{domain}} into atomic facts.
5+
6+
principles: |
7+
1. Make sure each fact is grounded in the given text.
8+
2. Include any necessary information needed to explain the fact or concept
9+
3. The atomic facts should be as simple as possible, if it’s compound sentence, break down one more time
10+
4. For clarity, avoid using pronouns like ’it’, ’he’, ’she’, ’this’, ’that’ etc., and instead use the full names or titles.
11+
5. Focus only on key concepts and facts. Skip any question or problems mentioned in the passage.
12+
13+
examples: |
14+
To help you understand the task, here is an example:
15+
[Passage]
16+
The tournament was contested by ten national teams, maintaining the same format used in 2019. After six weeks of round-robin matches, India, South Africa, Australia, and New Zealand finished as the top four and qualified for the knockout stage. In the knockout stage, India and Australia beat New Zealand and South Africa, respectively, to advance to the final, played on 19 November at the Narendra Modi Stadium in Ahmedabad. Australia won the final by six wickets, winning their sixth Cricket World Cup title.
17+
[Facts]
18+
1. The tournament was contested by ten national teams.
19+
2. The tournament maintained the same format used in 2019.
20+
3. The round-robin matches lasted for six weeks.
21+
4. India finished as one of the top four teams.
22+
5. South Africa finished as one of the top four teams.
23+
6. Australia finished as one of the top four teams.
24+
7. New Zealand finished as one of the top four teams.
25+
8. India, South Africa, Australia, and New Zealand qualified for the knockout stage.
26+
9. In the knockout stage, India beat New Zealand.
27+
10. In the knockout stage, Australia beat South Africa.
28+
11. India advanced to the final.
29+
12. Australia advanced to the final.
30+
13. The final was played on 19 November.
31+
14. The final was held at the Narendra Modi Stadium in Ahmedabad.
32+
15. Australia won the final by six wickets.
33+
16. Australia won their sixth Cricket World Cup title.
34+
[End]
35+
36+
37+
generation: |
38+
Now it's your turn breakdown following snippet from article about {{domain}} into atomic facts following similar style as above examples
39+
[Passage]
40+
{{document}}
41+
[Facts]
42+
43+
44+
start_tags: [""]
45+
end_tags: [""]
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
system: You are an AI assistant that is expert at summarizing text.
2+
3+
introduction: |
4+
Give me detailed summary for below document, making sure all key points are covered.
5+
6+
principles: |
7+
Do not add any new information.
8+
Do not miss any key points from the provided document
9+
10+
examples: ""
11+
12+
generation: |
13+
Document:
14+
{{document}}
15+
16+
start_tags: [""]
17+
end_tags: [""]
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
system: You are an AI assistant that is expert at summarizing text.
2+
3+
introduction: |
4+
Give me detailed extractive summary for below document, making sure all key points are covered.
5+
6+
principles: |
7+
Do not add any new information.
8+
Do not miss any key points from the provided document
9+
10+
examples: ""
11+
12+
generation: |
13+
Document:
14+
{{document}}
15+
16+
start_tags: [""]
17+
end_tags: [""]

src/instructlab/sdg/pipelines/llama/__init__.py

Whitespace-only changes.
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
version: "1.0"
2+
blocks:
3+
- name: gen_questions
4+
type: LLMBlock
5+
config:
6+
config_path: ../../configs/skills/freeform_questions.yaml
7+
output_cols:
8+
- question
9+
batch_kwargs:
10+
num_samples: 50
11+
drop_duplicates:
12+
- question
13+
- name: eval_questions
14+
type: LLMBlock
15+
config:
16+
config_path: ../../configs/skills/evaluate_freeform_questions.yaml
17+
output_cols:
18+
- evaluation
19+
- score
20+
- name: filter_questions
21+
type: FilterByValueBlock
22+
config:
23+
filter_column: score
24+
filter_value: 1.0
25+
operation: eq
26+
convert_dtype: float
27+
drop_columns:
28+
- evaluation
29+
- score
30+
- num_samples
31+
- name: gen_responses
32+
type: LLMBlock
33+
config:
34+
config_path: ../../configs/skills/freeform_responses.yaml
35+
output_cols:
36+
- response
37+
- name: evaluate_qa_pair
38+
type: LLMBlock
39+
config:
40+
config_path: ../../configs/skills/evaluate_freeform_pair.yaml
41+
output_cols:
42+
- evaluation
43+
- score
44+
- name: filter_qa_pair
45+
type: FilterByValueBlock
46+
config:
47+
filter_column: score
48+
filter_value: 2.0
49+
operation: ge
50+
convert_dtype: float
51+
drop_columns:
52+
- evaluation
53+
- score
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
version: "1.0"
2+
blocks:
3+
- name: gen_contexts
4+
type: LLMBlock
5+
config:
6+
config_path: ../../configs/skills/contexts.yaml
7+
output_cols:
8+
- context
9+
gen_kwargs:
10+
temperature: 0.7
11+
max_tokens: 4096
12+
n: 10
13+
seed: 42
14+
drop_duplicates:
15+
- context
16+
- name: gen_grounded_questions
17+
type: LLMBlock
18+
config:
19+
config_path: ../../configs/skills/grounded_questions.yaml
20+
output_cols:
21+
- question
22+
batch_kwargs:
23+
num_samples: 3
24+
drop_duplicates:
25+
- question
26+
- name: eval_grounded_questions
27+
type: LLMBlock
28+
config:
29+
config_path: ../../configs/skills/evaluate_grounded_questions.yaml
30+
output_cols:
31+
- evaluation
32+
- score
33+
- name: filter_grounded_questions
34+
type: FilterByValueBlock
35+
config:
36+
filter_column: score
37+
filter_value: 1.0
38+
operation: eq
39+
convert_dtype: float
40+
drop_columns:
41+
- evaluation
42+
- score
43+
- num_samples
44+
- name: gen_grounded_responses
45+
type: LLMBlock
46+
config:
47+
config_path: ../../configs/skills/grounded_responses.yaml
48+
output_cols:
49+
- response
50+
- name: evaluate_grounded_qa_pair
51+
type: LLMBlock
52+
config:
53+
config_path: ../../configs/skills/evaluate_grounded_pair.yaml
54+
output_cols:
55+
- evaluation
56+
- score
57+
- name: filter_grounded_qa_pair
58+
type: FilterByValueBlock
59+
config:
60+
filter_column: score
61+
filter_value: 2.0
62+
operation: ge
63+
convert_dtype: float
64+
- name: combine_question_and_context
65+
type: CombineColumnsBlock
66+
config:
67+
columns:
68+
- context
69+
- question
70+
output_col: question
Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
version: "1.0"
2+
blocks:
3+
- name: duplicate_document_col
4+
type: DuplicateColumnsBlock
5+
config:
6+
columns_map:
7+
document: base_document
8+
9+
- name: gen_detailed_summary
10+
type: LLMBlock
11+
config:
12+
config_path: ../../configs/knowledge/detailed_summary.yaml
13+
output_cols:
14+
- summary_detailed
15+
gen_kwargs:
16+
max_tokens: 2048
17+
18+
- name: gen_atomic_facts
19+
type: LLMBlock
20+
config:
21+
config_path: ../../configs/knowledge/atomic_facts.yaml
22+
output_cols:
23+
- summary_atomic_facts
24+
gen_kwargs:
25+
max_tokens: 2048
26+
27+
- name: gen_extractive_summary
28+
type: LLMBlock
29+
config:
30+
config_path: ../../configs/knowledge/extractive_summary.yaml
31+
output_cols:
32+
- summary_extractive
33+
gen_kwargs:
34+
max_tokens: 2048
35+
36+
- name: flatten_summary_columns
37+
type: FlattenColumnsBlock
38+
config:
39+
var_cols:
40+
- summary_detailed
41+
- summary_extractive
42+
- summary_atomic_facts
43+
- base_document
44+
value_name: summary
45+
var_name: dataset_type
46+
47+
- name: rename_to_document_column
48+
type: RenameColumnsBlock
49+
config:
50+
columns_map:
51+
document: raw_document
52+
summary: document
53+
54+
- name: knowledge generation
55+
type: LLMBlock
56+
config:
57+
config_path: ../../configs/knowledge/generate_questions_responses.yaml
58+
output_cols:
59+
- question
60+
- response
61+
batch_kwargs:
62+
batched: true
63+
parser_kwargs:
64+
parser_name: custom
65+
parsing_pattern: '\[(?:Question|QUESTION)\]\s*(.*?)\s*\[(?:Answer|ANSWER)\]\s*(.*?)\s*(?=\[(?:Question|QUESTION)\]|$)'
66+
parser_cleanup_tags:
67+
- "[END]"
68+
- "[End]"
69+
gen_kwargs:
70+
max_tokens: 4096
71+
72+
- name: eval_faithfulness_qa_pair
73+
type: LLMBlock
74+
config:
75+
config_path: ../../configs/knowledge/evaluate_faithfulness.yaml
76+
output_cols:
77+
- explanation
78+
- judgment
79+
gen_kwargs:
80+
max_tokens: 512
81+
82+
- name: filter_faithfulness
83+
type: FilterByValueBlock
84+
config:
85+
filter_column: judgment
86+
filter_value: "YES"
87+
operation: eq
88+
drop_columns:
89+
- judgment
90+
- explanation
91+
92+
- name: eval_relevancy_qa_pair
93+
type: LLMBlock
94+
config:
95+
config_path: ../../configs/knowledge/evaluate_relevancy.yaml
96+
output_cols:
97+
- feedback
98+
- score
99+
gen_kwargs:
100+
max_tokens: 512
101+
102+
- name: filter_relevancy
103+
type: FilterByValueBlock
104+
config:
105+
filter_column: score
106+
filter_value: 2.0
107+
operation: eq
108+
convert_dtype: float
109+
drop_columns:
110+
- feedback
111+
- score
112+
113+
- name: eval_verify_question
114+
type: LLMBlock
115+
config:
116+
config_path: ../../configs/knowledge/evaluate_question.yaml
117+
output_cols:
118+
- explanation
119+
- rating
120+
gen_kwargs:
121+
max_tokens: 512
122+
123+
- name: filter_verify_question
124+
type: FilterByValueBlock
125+
config:
126+
filter_column: rating
127+
filter_value: 1.0
128+
operation: eq
129+
convert_dtype: float
130+
drop_columns:
131+
- explanation
132+
- rating
133+
- __index_level_0__
134+
135+
datamixing:
136+
auxiliary_instructions:
137+
summary_detailed:
138+
- Provide me with a comprehensive summary of the given document.
139+
- Prepare a detailed breakdown of the contents of the document for me.
140+
- Summarize the document thoroughly, covering all important points.
141+
- Create a detailed executive summary of the provided document.
142+
- Compose a comprehensive overview of the document's content.
143+
- Deliver a detailed synopsis of the material presented in the document.
144+
- Furnish me with a detailed analysis of the document's key points.
145+
- Generate a thorough summary of the main ideas in the document.
146+
- Offer a detailed digest of the information contained in the document.
147+
- Supply me with a comprehensive rundown of the document's contents.
148+
summary_extractive:
149+
- Provide me with a summary of the document using extractive methods.
150+
- Create an extractive summary for the given document.
151+
- Generate an extractive summary from the document that was given to you.
152+
- Summarize the document using extractive techniques.
153+
- Create a summary of the provided document using extractive methods.
154+
- Generate an extractive summary for the document provided.
155+
- Using extractive techniques, summarize the given document.
156+
- Create a summary of the document using extractive summarization.
157+
- Generate an extractive summary of the document that was provided.
158+
- Summarize the provided document using extractive summarization techniques.
159+
summary_atomic_facts:
160+
- Identify and list all atomic facts from the document.
161+
- Extract all key facts from the given document.
162+
- List all the important facts from the provided document.
163+
- Highlight all the atomic facts present in the document.
164+
- Identify and enumerate all key facts from the given text.
165+
- List out all the critical information from the document.
166+
- Highlight all the essential facts from the provided text.
167+
- Identify and summarize all the important details from the document.
168+
- Extract all the atomic facts from the given document.
169+
- List all the key takeaways from the provided text.

0 commit comments

Comments
 (0)