This project is an AI Web Scraper built with Streamlit that leverages the LangChain OllamaLLM to scrape, process, and analyze web content. Users can input their queries, and the system fetches relevant content, processes it, and provides concise, context-aware answers.
- Streamlit Interface: User-friendly interface for input and interaction.
- Web Scraping: Uses Selenium and BeautifulSoup to extract body content from web pages.
- Content Processing: Cleans, splits, and parses content for efficient analysis.
- Reflective Question Answering: Utilizes
OllamaLLM
for context-aware, relevant responses to user queries. - Relevance Scoring: Evaluates responses based on relevance to user queries using normalized scores.
- Python 3.8+
- ChromeDriver (Ensure compatibility with your browser version)
- Llama 3.1
- Additional dependencies in
requirements.txt
-
Clone the repository:
git clone https://github.com/softsys4ai/RL-LLM.git cd RL-LLM
-
Install the required dependencies:
pip install -r requirements.txt
-
Set Google Custom Search API credentials:
- Get your "Custom Search JSON API" from here.
- Get your Google Custom Search Engine ID from here.
- Click "Add" or "Create a new search engine"
- Enter a name for your search engine.
- Under "Sites to Search", you can either: Enter a specific website (e.g., wikipedia.org) OR set it to search the entire web by entering *
- Click "Create"
- Place your API Key and ID into the
main.py
file.
-
Install Ollama: Download the correct version of Llama 3.1 for your device.
-
Run the Streamlit app:
streamlit run main.py