riaknostic
is an escript and set of tools that diagnoses common problems which could affect a Riak node or cluster. When experiencing any problem with Riak, riaknostic
should be the first thing run during troubleshooting. The tool is integrated with Riak via the riak-admin
script.
To diagnose problems with Riak, Riaknostic uses a series of checks which are derived from the experience of the Basho Client Services Team as well as numerous public discussions on the mailing list, IRC room, and other online media.
Here is a basic example of using riaknostic
followed immediately by the command's output:
$ riak-admin diag
15:34:52.736 [warning] Riak crashed at Wed, 07 Dec 2011 21:47:50 GMT, leaving
crash dump in /srv/riak/log/erl_crash.dump. Please inspect or remove the file.
15:34:52.736 [notice] Data directory /srv/riak/data/bitcask is not mounted with 'noatime'. Please remount its disk with the 'noatime' flag to improve
performance.
As shown in the above output, Riaknostic tells us about two problems right away. First, an Erlang crash dump is present, indicating that Riak has experienced a crash. Second, a performance problem is mentioned (disk mounted without noatime
argument)along with a helpful tip to resolve the issue.
Important: If you are running Riak v1.3.0 or greater, you already have Riaknostic, so you can skip to the Usage section below.
Riaknostic depends on features introduced by Erlang version R14B04, so verify that you've installed this version of Erlang before proceeding with installation.
To install riaknostic
, download the latest package version, and extract it within the directory shown for your operating system in the following table:
Platform | Directory |
---|---|
Linux (Redhat, CentOS, Debian, Ubuntu) | /usr/lib/riak/lib |
Linux (Fedora) | /usr/lib64/riak/lib |
Solaris, OpenSolaris | /opt/riak/lib |
SmartOS (Joyent) | /opt/local/lib/riak/lib |
Mac OS/X or Self-built | $RIAK/lib
(where $RIAK=rel/riak for source installs,
or the directory where you unpacked the package) |
An example Riaknostic installation for Linux looks like this:
wget https://github.com/basho/riaknostic/downloads/riaknostic-1.0.2.tar.gz -P /tmp
cd /usr/lib/riak/lib
sudo tar xzvf /tmp/riaknostic-1.0.2.tar.gz
The package will expand to a riaknostic/
directory which contains the riaknostic
script, source code in the src/
directory, and documentation.
Now try it out!
For most cases, you can just run the riak-admin diag
command as given at the top of this README. However, sometimes you might want to know some extra detail or run only specific checks. For that, there are command-line options. Execute riaknostic --help
to learn more about these options:
riak-admin diag --help
Usage: riak-admin diag [-d <level>] [-l] [-h] [check_name ...]
-d, --level Minimum message severity level (default: notice)
-l, --list Describe available diagnostic tasks
-h, --help Display help/usage
check_name A specific check to run
To get an idea of what checks will be run, use the --list
option:
riak-admin diag --list
Available diagnostic checks:
disk Data directory permissions and atime
dumps Find crash dumps
memory_use Measure memory usage
nodes_connected Cluster node liveness
ring_membership Cluster membership validity
ring_size Ring size valid
If you want all the gory details about what Riaknostic is doing, you can run the checks at a more verbose logging level with the --level option:
riak-admin diag --level debug
18:34:19.708 [debug] Lager installed handler lager_console_backend into lager_event
18:34:19.720 [debug] Lager installed handler error_logger_lager_h into error_logger
18:34:19.720 [info] Application lager started on node nonode@nohost
18:34:20.736 [debug] Not connected to the local Riak node, trying to connect. alive:false connect_failed:undefined
18:34:20.737 [debug] Starting distributed Erlang.
18:34:20.740 [debug] Supervisor net_sup started erl_epmd:start_link() at pid <0.42.0>
18:34:20.742 [debug] Supervisor net_sup started auth:start_link() at pid <0.43.0>
18:34:20.771 [debug] Supervisor net_sup started net_kernel:start_link(['[email protected]',longnames]) at pid <0.44.0>
18:34:20.771 [debug] Supervisor kernel_sup started erl_distribution:start_link(['[email protected]',longnames]) at pid <0.41.0>
18:34:20.781 [debug] Supervisor inet_gethost_native_sup started undefined at pid <0.49.0>
18:34:20.782 [debug] Supervisor kernel_safe_sup started inet_gethost_native:start_link() at pid <0.48.0>
18:34:20.834 [debug] Connected to local Riak node '[email protected]'.
18:34:20.939 [debug] Local RPC: os:getpid([]) [5000]
18:34:20.939 [debug] Running shell command: ps -o pmem,rss,command -p 83144
18:34:20.946 [debug] Shell command output:
%MEM RSS COMMAND
0.4 31004 /srv/riak/erts-5.8.4/bin/beam.smp -K true -A 64 -W w -- -root /srv/riak/rel/riak -progname riak -- -home /Users/sean -- -boot /srv/riak/releases/1.0.2/riak -embedded -config /srv/riak/etc/app.config -name [email protected] -setcookie riak -- console
18:34:20.960 [warning] Riak crashed at Wed, 07 Dec 2011 21:47:50 GMT, leaving crash dump in /srv/riak/log/erl_crash.dump. Please inspect or remove the file.
18:34:20.961 [notice] Data directory /srv/riak/data/bitcask is not mounted with 'noatime'. Please remount its disk with the 'noatime' flag to improve performance.
18:34:20.961 [info] Riak process is using 0.4% of available RAM, totalling 31004 KB of real memory.
Most times you'll want to use the defaults, but any Syslog severity name will do (from most to least verbose): debug, info, notice, warning, error, critical, alert, emergency
.
Finally, if you want to run just a single diagnostic or a list of specific ones, you can pass their name(s):
riak-admin diag dumps
18:41:24.083 [warning] Riak crashed at Wed, 07 Dec 2011 21:47:50 GMT, leaving crash dump in /srv/riak/log/erl_crash.dump. Please inspect or remove the file.
- Read DEVELOPMENT.md
- Fork the project on Github.
- Make your changes or additions on a "topic" branch, test and document them. If you are making a new diagnostic, make sure you give some module-level information about the checks it performs. Note: diagnostics should not make modifications to Riak, only inspect things.
- Push to your fork and send a pull-request.
- A Basho Developer Advocate or Engineer will review your pull-request and get back to you.