A configurable tool to analyze search results across different documentation sites. Currently supports analyzing search results from docs.upsun.com and docs.platform.sh, with the ability to add more sites through configuration.
- Automated search result analysis
- Configurable for multiple documentation sites
- Extracts search result titles, URLs, and sections
- Saves results to CSV files for analysis
- Headless browser operation
- Configurable delays and result limits
- Command-line interface for easy use
- Python 3.x
- Google Chrome browser
- pip (Python package installer)
- Clone this repository:
git clone https://github.com/gregqualls/docs-search-analyzer.git
cd [repository-directory]- Create and activate a virtual environment:
# Create virtual environment
python3 -m venv venv
# Activate on macOS/Linux
source venv/bin/activate
# Activate on Windows
.\venv\Scripts\activate- Install required packages:
pip install -r requirements.txtCreate or modify search_phrases.txt with one search phrase per line:
deployment process overview
supported programming languages
environment configuration settings
...
Site-specific settings are managed in search_config.py. Each site configuration includes:
SITES = {
"site_key": {
"name": "Site Display Name",
"base_url": "https://docs.example.com",
"search_path": "/search.html",
"search_query_param": "q",
"selectors": {
"result_container": "CSS selector for result container",
"result_title": "CSS selector for result title",
"result_url": "CSS selector for result URL"
},
"max_results": 5,
"delay_between_searches": 1,
"page_load_delay": 2
}
}name: Display name for the documentation sitebase_url: Base URL of the documentation sitesearch_path: Path to the search pagesearch_query_param: URL parameter name for search queriesselectors: CSS selectors for finding elements:result_container: Container element for each search resultresult_title: Element containing the result titleresult_url: Element containing the result URL
max_results: Maximum number of results to process per searchdelay_between_searches: Delay in seconds between searchespage_load_delay: Additional delay after page load
Run the analyzer with default settings (Upsun docs):
python3 analyze_search.pypython3 analyze_search.py --site platformpython3 analyze_search.py --phrases custom_phrases.txt--site: Which documentation site to analyze (default: upsun)--phrases: File containing search phrases (default: search_phrases.txt)
Results are saved to CSV files with the following naming pattern:
search_analysis_[site_name]_[timestamp].csv
The CSV files contain:
- date: Date of analysis
- site: Documentation site name
- search_phrase: Search phrase used
- result_url: URL of the result
- section: Section of the documentation
- page_title: Title of the page
- position: Position in search results (1-5)
To add support for a new documentation site:
- Add a new configuration to
search_config.py - Verify the CSS selectors by inspecting the site's search results page
- Test the configuration with a few search phrases
Example configuration template:
"new_site": {
"name": "New Site Docs",
"base_url": "https://docs.newsite.com",
"search_path": "/search",
"search_query_param": "q",
"selectors": {
"result_container": "div.search-result",
"result_title": "h3 a",
"result_url": "h3 a"
},
"max_results": 5,
"delay_between_searches": 1,
"page_load_delay": 2
}-
ChromeDriver Issues:
- Ensure Google Chrome is installed
- Try running without headless mode for debugging
- Check Chrome and ChromeDriver versions match
-
No Results Found:
- Verify site selectors in
search_config.py - Check if site requires JavaScript
- Increase
page_load_delayif site is slow
- Verify site selectors in
-
Permission Errors:
- Ensure virtual environment is activated
- Check file permissions for output directory
Contributions are welcome! Please feel free to submit pull requests with improvements or additional site configurations.