Skip to content

Commit 0eb2326

Browse files
Extract term defs demo (#11)
* Add streamlit SQL sandbox * Update gitignore, remove files * Migrate to llama_index 0.5.X, update apps * Update README with spaces links * Finish main merge * add term+definition extraction app * update readme * Update app info text
1 parent 3820c92 commit 0eb2326

File tree

8 files changed

+846
-0
lines changed

8 files changed

+846
-0
lines changed

README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,14 @@ There are two main example folders
5050
- The "Langchain+Llama Index" tab uses a custom langchain agent, and uses the SQL index from Llama Index as a tool during conversations.
5151
- Check out the huggingface space [here!](https://huggingface.co/spaces/llamaindex/llama_index_sql_sandbox)
5252

53+
- streamlit_term_definition (runs on localhost:8501)
54+
- `streamlit run streamlit_demo.py`
55+
- creates a small app that allows users to extract terms/definitions from documents and query against the extracted information
56+
- pre-loaded with information from the NYC Wikipedia page
57+
- supports reading text from image uploads
58+
- allows users to configure LLM settings
59+
- users can build their own knowledge base of terms/definitions
60+
- query against these terms as they are added
5361

5462
## Docker
5563
Each example contains a `Dockerfile`. You can run `docker build -t my_tag_name .` to build a python3.11-slim docker image inside your desired folder. It ends up being about 600MB-900MB depending on the example.
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
FROM python:3.11.0-slim
2+
3+
WORKDIR /app
4+
5+
COPY . .
6+
7+
RUN pip install -r requirements.txt && pip cache purge
8+
9+
# Streamlit
10+
CMD ["streamlit", "run", "streamlit_demo.py"]
11+
EXPOSE 8501
Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
from langchain.chains.prompt_selector import ConditionalPromptSelector, is_chat_model
2+
from langchain.prompts.chat import (
3+
AIMessagePromptTemplate,
4+
ChatPromptTemplate,
5+
HumanMessagePromptTemplate,
6+
)
7+
8+
from gpt_index.prompts.prompts import QuestionAnswerPrompt, RefinePrompt
9+
10+
# Text QA templates
11+
DEFAULT_TEXT_QA_PROMPT_TMPL = (
12+
"Context information is below. \n"
13+
"---------------------\n"
14+
"{context_str}"
15+
"\n---------------------\n"
16+
"Given the context information answer the following question "
17+
"(if you don't know the answer, use the best of your knowledge): {query_str}\n"
18+
)
19+
TEXT_QA_TEMPLATE = QuestionAnswerPrompt(DEFAULT_TEXT_QA_PROMPT_TMPL)
20+
21+
# Refine templates
22+
DEFAULT_REFINE_PROMPT_TMPL = (
23+
"The original question is as follows: {query_str}\n"
24+
"We have provided an existing answer: {existing_answer}\n"
25+
"We have the opportunity to refine the existing answer "
26+
"(only if needed) with some more context below.\n"
27+
"------------\n"
28+
"{context_msg}\n"
29+
"------------\n"
30+
"Given the new context and using the best of your knowledge, improve the existing answer. "
31+
"If you can't improve the existing answer, just repeat it again. "
32+
"Do not mention that you've read the above context."
33+
)
34+
DEFAULT_REFINE_PROMPT = RefinePrompt(DEFAULT_REFINE_PROMPT_TMPL)
35+
36+
CHAT_REFINE_PROMPT_TMPL_MSGS = [
37+
HumanMessagePromptTemplate.from_template("{query_str}"),
38+
AIMessagePromptTemplate.from_template("{existing_answer}"),
39+
HumanMessagePromptTemplate.from_template(
40+
"We have the opportunity to refine the above answer "
41+
"(only if needed) with some more context below.\n"
42+
"------------\n"
43+
"{context_msg}\n"
44+
"------------\n"
45+
"Given the new context and using the best of your knowledge, improve the existing answer. "
46+
"If you can't improve the existing answer, just repeat it again. "
47+
"Do not mention that you've read the above context."
48+
),
49+
]
50+
51+
CHAT_REFINE_PROMPT_LC = ChatPromptTemplate.from_messages(CHAT_REFINE_PROMPT_TMPL_MSGS)
52+
CHAT_REFINE_PROMPT = RefinePrompt.from_langchain_prompt(CHAT_REFINE_PROMPT_LC)
53+
54+
# refine prompt selector
55+
DEFAULT_REFINE_PROMPT_SEL_LC = ConditionalPromptSelector(
56+
default_prompt=DEFAULT_REFINE_PROMPT.get_langchain_prompt(),
57+
conditionals=[(is_chat_model, CHAT_REFINE_PROMPT.get_langchain_prompt())],
58+
)
59+
REFINE_TEMPLATE = RefinePrompt(
60+
langchain_prompt_selector=DEFAULT_REFINE_PROMPT_SEL_LC
61+
)
62+
63+
DEFAULT_TERM_STR = (
64+
"Make a list of terms and definitions that are defined in the context, "
65+
"with one pair on each line. "
66+
"If a term is missing it's definition, use your best judgment. "
67+
"Write each line as as follows:\nTerm: <term> Definition: <definition>"
68+
)
69+
70+
DEFAULT_TERMS = {
71+
"New York City": "The most populous city in the United States, located at the southern tip of New York State, and the largest metropolitan area in the U.S. by both population and urban area.",
72+
"boroughs": "Five administrative divisions of New York City, each coextensive with a respective county of the state of New York: Brooklyn, Queens, Manhattan, The Bronx, and Staten Island.",
73+
"metropolitan statistical area": "A geographical region with a relatively high population density at its core and close economic ties throughout the area.",
74+
"combined statistical area": "A combination of adjacent metropolitan and micropolitan statistical areas in the United States and Puerto Rico that can demonstrate economic or social linkage.",
75+
"megacities": "A city with a population of over 10 million people.",
76+
"United Nations": "An intergovernmental organization that aims to maintain international peace and security, develop friendly relations among nations, achieve international cooperation, and be a center for harmonizing the actions of nations.",
77+
"Pulitzer Prizes": "A series of annual awards for achievements in journalism, literature, and musical composition in the United States.",
78+
"Times Square": "A major commercial and tourist destination in Manhattan, New York City.",
79+
"New Netherland": "A Dutch colony in North America that existed from 1614 until 1664.",
80+
"Dutch West India Company": "A Dutch trading company that operated as a monopoly in New Netherland from 1621 until 1639-1640.",
81+
"patroon system": "A system instituted by the Dutch to attract settlers to New Netherland, whereby wealthy Dutchmen who brought 50 colonists would be awarded land and local political autonomy.",
82+
"Peter Stuyvesant": "The last Director-General of New Netherland, who served from 1647 until 1664.",
83+
"Treaty of Breda": "A treaty signed in 1667 between the Dutch and English that resulted in the Dutch keeping Suriname and the English keeping New Amsterdam (which was renamed New York).",
84+
"African Burying Ground": "A cemetery discovered in Foley Square in the 1990s that included 10,000 to 20,000 graves of colonial-era Africans, some enslaved and some free.",
85+
"Stamp Act Congress": "A meeting held in New York in 1765 in response to the Stamp Act, which imposed taxes on printed materials in the American colonies.",
86+
"Battle of Long Island": "The largest battle of the American Revolutionary War, fought on August 27, 1776, in Brooklyn, New York City.",
87+
"New York Police Department": "The police force of New York City.",
88+
"Irish immigrants": "People who immigrated to the United States from Ireland.",
89+
"lynched": "To kill someone, especially by hanging, without a legal trial.",
90+
"civil unrest": "A situation in which people in a country are angry and likely to protest or fight.",
91+
"megacity": "A very large city, typically one with a population of over ten million people.",
92+
"World Trade Center": "A complex of buildings in Lower Manhattan, New York City, that were destroyed in the September 11 attacks.",
93+
"COVID-19": "A highly infectious respiratory illness caused by the SARS-CoV-2 virus.",
94+
"monkeypox outbreak": "An outbreak of a viral disease similar to smallpox, which occurred in the LGBT community in New York City in 2022.",
95+
"Hudson River": "A river in the northeastern United States, flowing from the Adirondack Mountains in New York into the Atlantic Ocean.",
96+
"estuary": "A partly enclosed coastal body of brackish water with one or more rivers or streams flowing into it, and with a free connection to the open sea.",
97+
"East River": "A tidal strait in New York City.",
98+
"Five Boroughs": "Refers to the five counties that make up New York City: Bronx, Brooklyn, Manhattan, Queens, and Staten Island.",
99+
"Staten Island": "The most suburban of the five boroughs, located southwest of Manhattan and connected to it by the free Staten Island Ferry.",
100+
"Todt Hill": "The highest point on the eastern seaboard south of Maine, located on Staten Island.",
101+
"Manhattan": "The geographically smallest and most densely populated borough of New York City, known for its skyscrapers, Central Park, and cultural, administrative, and financial centers.",
102+
"Brooklyn": "The most populous borough of New York City, located on the western tip of Long Island and known for its cultural diversity, independent art scene, and distinctive neighborhoods.",
103+
"Queens": "The largest borough of New York City, located on Long Island north and east of Brooklyn, and known for its ethnic diversity, commercial and residential prominence, and hosting of the annual U.S. Open tennis tournament.",
104+
"The Bronx": "The northernmost borough of New York",
105+
}

streamlit_term_definition/index.json

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)