Node Scraper is a tool which performs automated data collection and analysis for the purposes of system debug.
- Installation
- CLI Usage
- Configs
- Extending Node Scraper (integration & external plugins) → See EXTENDING.md
- Full view of the plugins with the associated collectors & analyzers as well as the commands invoked by collectors -> See docs/PLUGIN_DOC.md
Node Scraper requires Python 3.9+ for installation. After cloning this repository, call dev-setup.sh script with 'source'. This script creates an editable install of Node Scraper in a python virtual environment and also configures the pre-commit hooks for the project.
source dev-setup.shAlternatively, follow these manual steps:
python3 -m venv venv
source venv/bin/activateOn Debian/Ubuntu, you may need: sudo apt install python3-venv
python3 -m pip install --editable .[dev] --upgradeThis installs Node Scraper in editable mode with development dependencies. To verify: node-scraper --help
pre-commit installSets up pre-commit hooks for code quality checks. On Debian/Ubuntu, you may need: sudo apt install pre-commit
The Node Scraper CLI can be used to run Node Scraper plugins on a target system. The following CLI options are available:
usage: node-scraper [-h] [--sys-name STRING] [--sys-location {LOCAL,REMOTE}] [--sys-interaction-level {PASSIVE,INTERACTIVE,DISRUPTIVE}] [--sys-sku STRING]
[--sys-platform STRING] [--plugin-configs [STRING ...]] [--system-config STRING] [--connection-config STRING] [--log-path STRING]
[--log-level {CRITICAL,FATAL,ERROR,WARN,WARNING,INFO,DEBUG,NOTSET}] [--gen-reference-config] [--skip-sudo]
{summary,run-plugins,describe,gen-plugin-config} ...
node scraper CLI
positional arguments:
{summary,run-plugins,describe,gen-plugin-config}
Subcommands
summary Generates summary csv file
run-plugins Run a series of plugins
describe Display details on a built-in config or plugin
gen-plugin-config Generate a config for a plugin or list of plugins
options:
-h, --help show this help message and exit
--sys-name STRING System name (default: <my_system_name>)
--sys-location {LOCAL,REMOTE}
Location of target system (default: LOCAL)
--sys-interaction-level {PASSIVE,INTERACTIVE,DISRUPTIVE}
Specify system interaction level, used to determine the type of actions that plugins can perform (default: INTERACTIVE)
--sys-sku STRING Manually specify SKU of system (default: None)
--sys-platform STRING
Specify system platform (default: None)
--plugin-configs [STRING ...]
built-in config names or paths to plugin config JSONs. Available built-in configs: AllPlugins, NodeStatus (default: None)
--system-config STRING
Path to system config json (default: None)
--connection-config STRING
Path to connection config json (default: None)
--log-path STRING Specifies local path for node scraper logs, use 'None' to disable logging (default: .)
--log-level {CRITICAL,FATAL,ERROR,WARN,WARNING,INFO,DEBUG,NOTSET}
Change python log level (default: INFO)
--gen-reference-config
Generate reference config from system. Writes to ./reference_config.json. (default: False)
--skip-sudo Skip plugins that require sudo permissions (default: False)
Node Scraper can operate in two modes: LOCAL and REMOTE, determined by the --sys-location argument.
- LOCAL (default): Node Scraper is installed and run directly on the target system. All data collection and plugin execution occur locally.
- REMOTE: Node Scraper runs on your local machine but targets a remote system over SSH. In this mode, Node Scraper does not need to be installed on the remote system; all commands are executed remotely via SSH.
To use remote execution, specify --sys-location REMOTE and provide a connection configuration file with --connection-config.
node-scraper --sys-name <remote_host> --sys-location REMOTE --connection-config ./connection_config.json run-plugins DmesgPluginIn-band (SSH) connection:
{
"InBandConnectionManager": {
"hostname": "remote_host.example.com",
"port": 22,
"username": "myuser",
"password": "mypassword",
"key_filename": "/path/to/private/key"
}
}Redfish (BMC) connection for Redfish-only plugins:
{
"RedfishConnectionManager": {
"host": "bmc.example.com",
"port": 443,
"username": "admin",
"password": "secret",
"use_https": true,
"verify_ssl": true,
"api_root": "redfish/v1"
}
}api_root(optional): Redfish API path (e.g.redfish/v1). If omitted, the defaultredfish/v1is used. Override this when your BMC uses a different API version path.
Notes:
- If using SSH keys, specify
key_filenameinstead ofpassword. - The remote user must have permissions to run the requested plugins and access required files. If needed, use the
--skip-sudoargument to skip plugins requiring sudo.
Plugins to run can be specified in two ways, using a plugin JSON config file or using the 'run-plugins' sub command. These two options are not mutually exclusive and can be used together.
You can use the describe subcommand to display details about built-in configs or plugins.
List all built-in configs:
node-scraper describe configShow details for a specific built-in config
node-scraper describe config <config-name>List all available plugins**
node-scraper describe pluginShow details for a specific plugin
node-scraper describe plugin <plugin-name>The plugins to run and their associated arguments can also be specified directly on the CLI using the 'run-plugins' sub-command. Using this sub-command you can specify a plugin name followed by the arguments for that particular plugin. Multiple plugins can be specified at once.
You can view the available arguments for a particular plugin by running
node-scraper run-plugins <plugin-name> -h:
usage: node-scraper run-plugins BiosPlugin [-h] [--collection {True,False}] [--analysis {True,False}] [--system-interaction-level STRING]
[--data STRING] [--exp-bios-version [STRING ...]] [--regex-match {True,False}]
options:
-h, --help show this help message and exit
--collection {True,False}
--analysis {True,False}
--system-interaction-level STRING
--data STRING
--exp-bios-version [STRING ...]
--regex-match {True,False}
Examples
Run a single plugin
node-scraper run-plugins BiosPlugin --exp-bios-version TestBios123Run multiple plugins
node-scraper run-plugins BiosPlugin --exp-bios-version TestBios123 RocmPlugin --exp-rocm TestRocm123Run plugins without specifying args (plugin defaults will be used)
node-scraper run-plugins BiosPlugin RocmPluginUse plugin configs and 'run-plugins'
node-scraper run-plugins BiosPluginThe 'gen-plugin-config' sub command can be used to generate a plugin config JSON file for a plugin or list of plugins that can then be customized. Plugin arguments which have default values will be prepopulated in the JSON file, arguments without default values will have a value of 'null'.
Examples
Generate a config for the DmesgPlugin:
node-scraper gen-plugin-config --plugins DmesgPluginThis would produce the following config:
{
"global_args": {},
"plugins": {
"DmesgPlugin": {
"collection": true,
"analysis": true,
"system_interaction_level": "INTERACTIVE",
"data": null,
"analysis_args": {
"analysis_range_start": null,
"analysis_range_end": null,
"check_unknown_dmesg_errors": true,
"exclude_category": null,
"interval_to_collapse_event": 60,
"num_timestamps": 3
}
}
},
"result_collators": {}
}Running DmesgPlugin with a dmesg log file:
Instead of collecting dmesg from the system, you can analyze a pre-existing dmesg log file using the --data argument:
node-scraper --run-plugins DmesgPlugin --data /path/to/dmesg.log --collection FalseThis will skip the collection phase and directly analyze the provided dmesg.log file.
Custom Error Regex Example:
You can extend the built-in error detection with custom regex patterns. Create a config file with custom error patterns:
{
"global_args": {},
"plugins": {
"DmesgPlugin": {
"analysis_args": {
"check_unknown_dmesg_errors": false,
"interval_to_collapse_event": 60,
"num_timestamps": 3,
"error_regex": [
{
"regex": "MY_CUSTOM_ERROR.*",
"message": "My Custom Error Detected",
"event_category": "SW_DRIVER",
"event_priority": 3
},
{
"regex": "APPLICATION_CRASH: .*",
"message": "Application Crash",
"event_category": "SW_DRIVER",
"event_priority": 4
}
]
}
}
},
"result_collators": {}
}Save this to dmesg_custom_config.json and run:
node-scraper --plugin-configs dmesg_custom_config.json run-plugins DmesgPluginThe compare-runs subcommand compares datamodels from two run log directories (e.g. two
nodescraper_log_* folders). By default, all plugins with data in both runs are compared.
Basic usage:
node-scraper compare-runs <path1> <path2>Exclude specific plugins from the comparison with --skip-plugins:
node-scraper compare-runs path1 path2 --skip-plugins SomePluginCompare only certain plugins with --include-plugins:
node-scraper compare-runs path1 path2 --include-plugins DmesgPluginShow full diff output (no truncation of the Message column or limit on number of errors) with --dont-truncate:
node-scraper compare-runs path1 path2 --include-plugins DmesgPlugin --dont-truncateYou can pass multiple plugin names to --skip-plugins or --include-plugins.
The show-redfish-oem-allowable subcommand fetches the list of OEM diagnostic types supported by your BMC (from the Redfish LogService OEMDiagnosticDataType@Redfish.AllowableValues). Use it to discover which types you can put in oem_diagnostic_types_allowable and oem_diagnostic_types in the Redfish OEM diag plugin config.
Requirements: A Redfish connection config (same as for RedfishOemDiagPlugin).
Command:
node-scraper --connection-config connection-config.json show-redfish-oem-allowable --log-service-path "redfish/v1/Systems/UBB/LogServices/DiagLogs"Output is a JSON array of allowable type names (e.g. ["Dmesg", "JournalControl", "AllLogs", ...]). Copy that list into your plugin config’s oem_diagnostic_types_allowable if you want to match your BMC.
Redfish OEM diag plugin config example
Use a plugin config that points at your LogService and lists the types to collect. Logs are written under the run log path (see --log-path).
{
"name": "Redfish OEM diagnostic logs",
"desc": "Collect OEM diagnostic logs from Redfish LogService. Requires Redfish connection config.",
"global_args": {},
"plugins": {
"RedfishOemDiagPlugin": {
"collection_args": {
"log_service_path": "redfish/v1/Systems/UBB/LogServices/DiagLogs",
"oem_diagnostic_types_allowable": [
"JournalControl",
"AllLogs",
...
],
"oem_diagnostic_types": ["JournalControl", "AllLogs"],
"task_timeout_s": 600
},
"analysis_args": {
"require_all_success": false
}
}
},
"result_collators": {}
}log_service_path: Redfish path to the LogService (e.g. DiagLogs). Must match your system (e.g.UBBvs. another system id).oem_diagnostic_types_allowable: Full list of types the BMC supports (fromshow-redfish-oem-allowableor vendor docs).oem_diagnostic_types: Subset of types to collect on each run (e.g.["JournalControl", "AllLogs"]).task_timeout_s: Max seconds to wait per collection task.
How to use
- Discover allowable types (optional): run
show-redfish-oem-allowableand paste the output intooem_diagnostic_types_allowablein your plugin config. - Set
oem_diagnostic_typesto the list you want to collect (e.g.["JournalControl", "AllLogs"]). - Run the plugin with a Redfish connection config and your plugin config:
node-scraper --connection-config connection-config.json --plugin-config plugin_config_redfish_oem_diag.json run-plugins RedfishOemDiagPlugin
- Use
--log-pathto choose where run logs (and OEM diag archives) are written.
The 'summary' subcommand can be used to combine results from multiple runs of node-scraper to a single summary.csv file. Sample run:
node-scraper summary --search-path /<path_to_node-scraper_logs>This will generate a new file '/<path_to_node-scraper_logs>/summary.csv' file. This file will contain the results from all 'nodescraper.csv' files from '/<path_to_node-scarper_logs>'.
A plugin JSON config should follow the structure of the plugin config model defined here. The globals field is a dictionary of global key-value pairs; values in globals will be passed to any plugin that supports the corresponding key. The plugins field should be a dictionary mapping plugin names to sub-dictionaries of plugin arguments. Lastly, the result_collators attribute is used to define result collator classes that will be run on the plugin results. By default, the CLI adds the TableSummary result collator, which prints a summary of each plugin’s results in a tabular format to the console.
{
"globals_args": {},
"plugins": {
"BiosPlugin": {
"analysis_args": {
"exp_bios_version": "TestBios123"
}
},
"RocmPlugin": {
"analysis_args": {
"exp_rocm_version": "TestRocm123"
}
}
}
}Global args can be used to skip sudo plugins or enable/disble either collection or analysis. Below is an example that skips sudo requiring plugins and disables analysis.
"global_args": {
"collection_args": {
"skip_sudo" : 1
},
"collection" : 1,
"analysis" : 0
},A plugin config can be used to compare the system data against the config specifications. Built-in configs include NodeStatus (a subset of plugins) and AllPlugins (runs every registered plugin with default arguments—useful for generating a reference config from the full system).
Using a JSON file:
node-scraper --plugin-configs plugin_config.jsonHere is an example of a comprehensive plugin config that specifies analyzer args for each plugin:
{
"global_args": {},
"plugins": {
"BiosPlugin": {
"analysis_args": {
"exp_bios_version": "3.5"
}
},
"CmdlinePlugin": {
"analysis_args": {
"cmdline": "imgurl=test NODE=nodename selinux=0 serial console=ttyS1,115200 console=tty0",
"required_cmdline" : "selinux=0"
}
},
"DkmsPlugin": {
"analysis_args": {
"dkms_status": "amdgpu/6.11",
"dkms_version" : "dkms-3.1",
"regex_match" : true
}
},
"KernelPlugin": {
"analysis_args": {
"exp_kernel": "5.11-generic"
}
},
"OsPlugin": {
"analysis_args": {
"exp_os": "Ubuntu 22.04.2 LTS"
}
},
"PackagePlugin": {
"analysis_args": {
"exp_package_ver": {
"gcc": "11.4.0"
},
"regex_match": false
}
},
"RocmPlugin": {
"analysis_args": {
"exp_rocm": "6.5"
}
}
},
"result_collators": {},
"name": "plugin_config",
"desc": "My golden config"
}This command can be used to generate a reference config that is populated with current system configurations. Plugins that use analyzer args (where applicable) will be populated with system data.
Run all registered plugins (AllPlugins config):
node-scraper --plugin-config AllPlugins
Generate a reference config for specific plugins:
node-scraper --gen-reference-config run-plugins BiosPlugin OsPlugin
This will generate the following config:
{
"global_args": {},
"plugins": {
"BiosPlugin": {
"analysis_args": {
"exp_bios_version": [
"M17"
],
"regex_match": false
}
},
"OsPlugin": {
"analysis_args": {
"exp_os": [
"8.10"
],
"exact_match": true
}
}
},
"result_collators": {}This config can later be used on a different platform for comparison, using the steps at #2:
node-scraper --plugin-configs reference_config.json
An alternate way to generate a reference config is by using log files from a previous run. The example below uses log files from 'scraper_logs_/':
node-scraper gen-plugin-config --gen-reference-config-from-logs scraper_logs_<path>/ --output-path custom_output_dirThis will generate a reference config that includes plugins with logged results in 'scraper_log_' and save the new config to 'custom_output_dir/reference_config.json'.