New Relic recently released a new built-in log parser rule syslog-rfc5424 which allows Syslog RFC-5424 unstructured messages to be organized into attribute/value pairs and then be used to drive dashboards and alerts based on subsets of log data.
This tutorial will guide you through creating a Syslog RFC-5424 dashboard in the New Relic Platform using the new built-in parser rule and observability as code with Terraform.
But wait! why should we think about dashboard as code and not just creating them using the UI? Dashboards are important tools that help you to understand your applications state providing visual insights and helping you to take actions based on historic data and trends. It also answers questions like "When the problem has started?", "What's the impact of this issue?" and so on. That said, we should treat dashboards like any other important resource, and manually maintaining them is error-prone and non-optimal in terms of efficiency and security, it also does not provide any modification history, rollback mechanisms, peers reviews, and all the benefits we normally get in CI/CD pipelines.
To build this Syslog dashboard, first, we need to understand how the log severities are defined in the Syslog RFC-5424 format.
The PRI part is used to represent both the Facility and Severity, this value is calculated by first multiplying the Facility number by 8 and then adding the numerical value of the severity, for example, a security/authorization (facility = 4) with a critical severity (severity = 2) would have a PRI
value of 34 ((4 * 8) + 2
). That said, we can now extract the log severity from the PRI
log attribute using the following formula: (pri - ((floor(pri)/8) * 8))
.
Terraform uses its configuration language also known as HCL. The main purpose of this language is to describe resources which are used to represent some infrastructure objects. These resources are processed by specific plugins called providers.
New Relic has an official Terraform provider. It allows users to manage different types of resources such as dashboards, alert channels and alert policies. More information about the Terraform provider can be found on the provider documention and in this Terraform Provider for New Relic - Getting Started Guide quick tip video.
The first step is to create your project folder, there are different ways of organizing a Terraform workspace, it’s quite flexible and can be adapted depending on the requirements of your project like multiple environments, multiple accounts, and so on. In this example, we will adopt a flat structure for simplicity's sake.
-
First, create a directory for your project
mkdir newrelic-syslog-monitoring
-
Create a file named
versions.tf
in your working directory, this file will be used by Terraform to configure the Terraform client and specify all required providers by the current module. in this case, thenewrelic
provider.terraform { required_version = ">= 0.13" required_providers { newrelic = { source = "newrelic/newrelic" version = ">= 2.12.0" } } }
-
The New Relic Terraform provider requires an Account ID, Personal Key and Region (
US
orEU
) to be able to integrate with your account and manage resources. It supports two methods of configuration using environment variables or via provider block. To keep this example simple, the provider block will be used and all required information will be received via input variables. For that, create a file namedvariables.tf
:variable "NEWRELIC_ACCOUNT_ID" { type = number } variable "NEWRELIC_API_KEY" { type = string } variable "NEWRELIC_REGION" { type = string }
-
Create a file named
main.tf
to be the primary entrypoint for Terraform. The New Relic provider will also be configured in this file using the input variables previously declared in thevariables.tf
file:provider "newrelic" { account_id = var.NEWRELIC_ACCOUNT_ID api_key = var.NEWRELIC_API_KEY region = var.NEWRELIC_REGION }
All the dashboard data will be retrieved from the Log
datatype using the NRQL query language. To simplify the queries and to avoid repetition, Terraform's locals can be defined to represent the Syslog severity formula and the logType filter value.
-
Create a file named
dashboards.tf
, this file will be used to describe the dashboard resource and its widgets/visualizations:locals { syslog = "syslog-rfc5424" severity = "(numeric(pri) - (floor(numeric(pri)/8) * 8))" } resource "newrelic_dashboard" "syslog_dashboard" { title = "Syslog Dashboard" grid_column_count = 12 }
The first step to designing your dashboard is to define what you’re trying to achieve and which visualizations would be helpful for that. The main goal of this dashboard example is to give you an overview of all your applications healthinesses without the need to deep into tons of log lines searching for problematic severities.
Note: All widget code goes inside the
"syslog_dashboard" {...}
block
The log's severity is one of the most important field available in the Syslog format and will be widely used on this dashboard's visualizations. These billboard charts will show the log counters by severity, colorizing then yellow or red depending on the threshold_yellow
and threshold_red
values. Having these charts should make it easy to see what's happening with your applications and would catch your attention if any problematic log arrives.
As these billboard charts shares almost the same code, you can take advantage of Terrafrom's dynamic blocks and reuse it iterating over a severity_billboards
map configuring every widget. For that, add a severity_billboards
map inside the locals{...}
block with the following content:
locals {
syslog = "syslog-rfc5424"
severity = "(numeric(pri) - (floor(numeric(pri)/8) * 8))"
severity_billboards = tomap({
"emergency" = { severity = 0, row = 1, column = 1, threshold_red = 1 },
"alert" = { severity = 1, row = 2, column = 1, threshold_red = 1 },
"critical" = { severity = 2, row = 1, column = 2, threshold_red = 1 },
"error" = { severity = 3, row = 2, column = 2, threshold_yellow = 1 },
"warning" = { severity = 4, row = 1, column = 3 },
"notice" = { severity = 5, row = 2, column = 3 },
"informational" = { severity = 6, row = 1, column = 4 },
"debug" = { severity = 7, row = 2, column = 4 }
})
}
And then add the widget generic code inside the syslog_dashboard" {...}
block:
dynamic "widget" {
for_each = local.severities
content {
title = ""
nrql = <<-EOF
SELECT
count(*) as '${title(widget.key)} (${widget.value.severity})'
FROM Log
WHERE logType = '${local.syslog}' AND ${local.severity} = ${widget.value.severity}
EOF
visualization = "billboard"
width = 1
height = 1
row = widget.value.row
column = widget.value.column
threshold_yellow = try(widget.value.threshold_yellow, null)
threshold_red = try(widget.value.threshold_red, null)
}
}
This chart will show how the logs total and rate per minute your applications are sending.
widget {
title = "Throughput"
nrql = <<-EOF
SELECT
rate(count(*), 1 minute) as 'Logs /min',
count(*) as 'Total'
FROM Log
WHERE logType = '${local.syslog}' SINCE 1 hour ago
EOF
visualization = "attribute_sheet"
width = 2
height = 2
row = 1
column = 5
}
This chart will count up all logs with severity equals to Error(3)
, Critical(2)
, Alert(1)
or Emergency(0)
and display it over time, spikes on this graph means you may have problems with your applications and some actions should be taken to resolve it.
widget {
title = "Logs (Emergency + Alert + Critical + Error)"
nrql = <<-EOF
SELECT
count(*)
FROM Log
WHERE logType = '${local.syslog}' AND ${local.severity} < 4
TIMESERIES AUTO
EOF
visualization = "line_chart"
width = 6
height = 3
row = 3
column = 1
}
These charts will show the number of logs by application and hostname, it also can be configured to filter the current dashboard just clicking on the application/hostname bars.
widget {
title = "Top Applications"
nrql = <<-EOF
SELECT
count(*)
FROM Log
WHERE logType = '${local.syslog}'
FACET app.name
EOF
visualization = "facet_bar_chart"
width = 2
height = 8
row = 1
column = 7
}
widget {
title = "Top Nodes"
nrql = <<-EOF
SELECT
count(*) as 'Logs'
FROM Log
WHERE logType = '${local.syslog}'
FACET hostname
EOF
visualization = "facet_bar_chart"
width = 2
height = 8
row = 1
column = 9
}
The idea behind these charts is to display how many logs by severity and facility your applications are sending over time, this way you can easily detect spikes of any severity or facility knowing when they started and stopped to happen.
widget {
title = "Logs by Severity"
nrql = <<-EOF
SELECT
count(*)
FROM Log
WHERE logType = '${local.syslog}'
FACET string(${local.severity}) as 'Severity'
TIMESERIES AUTO
EOF
visualization = "faceted_line_chart"
width = 3
height = 3
row = 6
column = 1
}
widget {
title = "Logs by Facility"
nrql = <<-EOF
SELECT
count(*)
FROM Log
WHERE logType = '${local.syslog}'
FACET floor(numeric(pri)/8) as 'Facility'
TIMESERIES AUTO
EOF
visualization = "faceted_line_chart"
width = 3
height = 3
row = 6
column = 4
}
widget {
title = "Top 100 Logs"
nrql = <<-EOF
SELECT
${local.severity} as 'Severity',
app.name as 'Application',
message
FROM Log
WHERE logType = '${local.syslog}' LIMIT 100
EOF
column = 7
row = 6
visualization = "event_table"
width = 6
height = 3
}
widget {
title = ""
width = 2
height = 5
row = 1
column = 11
source = <<-EOF
### Facilities
0. kernel messages
1. user-level messages
2. mail system
3. system daemons
4. security/authorization messages (note 1)
5. messages generated internally by syslogd
6. line printer subsystem
7. network news subsystem
8. UUCP subsystem
9. clock daemon (note 2)
10. security/authorization messages (note 1)
11. FTP daemon
12. NTP subsystem
13. log audit (note 1)
14. log alert (note 1)
15. clock daemon (note 2)
16. to 23. local uses 0 to 7 (local n)
EOF
visualization = "markdown"
}
Terraform client can be installed either by downloading the binary from https://www.terraform.io/downloads.html or using your operating system's package manager, more instructions on how to install Terraform in different environments can be found here.
Once installed the Terraform client, run the following command in your working directory:
terraform plan -var NEWRELIC_ACCOUNT_ID=<YOUR-ACCOUNT-ID> -var NEWRELIC_API_KEY=<YOUR-API-KEY> -var NEWRELIC_REGION=<US or EU>
The terraform plan
command is used to create an execution plan and then determines what actions are necessary to achieve the desired state specified in the configuration files, in this case, your dashboard resource will be added.
Finally, run the following command to apply all pending actions and create the resources in the New Relic platform:
terraform apply -var NEWRELIC_ACCOUNT_ID=<YOUR-ACCOUNT-ID> -var NEWRELIC_API_KEY=<YOUR-API-KEY> -var NEWRELIC_REGION=<US or EU>
Terraform uses states to map your local resources to the real world, when you have a declared resource like resource "newrelic_dashboard" "syslog_dashboard"
in your files, Terraform uses this map to know that the New Relic Dashboard ID 1234
is represented by that resource. That said, if you apply this project in different machines without sharing the state, Terraform will recreate all resources instead of updating them. Setting up a remote state would solve this issue.
HashiCorp offers a Terraform Cloud solution that helps teams use Terraform together out of the box. It's also possible to use Atlantis which is an amazing tool to automate Terraform via pull requests taking your observability as code to the next level.
Although dashboards are wonderful tools for helping in problem detection and troubleshooting, you probably can't watch it 24 hours a day, it is not efficient and you can easily miss some important logs. To making monitoring easier, New Relic provides an amazing set of alerts that will definitely help you to solve your application issues faster and with less noise before they turn into critical incidents. It also supports third-party integrations such as PageDuty and Slack making the notification process very efficient and adaptable to your team needs.
The New Relic Terraform provider supports all required alerting resources to monitor your Syslog applications. It is possible, for example, to create different alert channels per team, responsibility, node, or application notifying different people in different ways when applications are reporting errors.
For this specific example, we could reuse the dashboard queries and define the following NRQL alert conditions:
- A static threshold alarm for critical severities
Error(3)
,Critical(2)
,Alert(1)
andEmergency(0)
. - Baseline alarm in upper direction for log counters with
severity < 4
to detect abnormal unhealthy spikes. - What else? The possibilities are huge and it depends on your environment and system characteristics, maybe a static alert for logs with
severity < 4
and facility equals tosecurity/authorization messages(4)
sending all notification messages to the#security-team
slack channel would be better for you than sending it to every people in the office. All in all, the New Relic alerting system is quite flexible and does support complex scenarios you might have.
If you are interested in learning more about New Relic alerts please visit our alert documentation. For more information about New Relic alerts with Terraform, please check this blog post out.
All the code used in this example can be found in this GitHub repository. If you don't want to use Terraform but would like to try the dashboard out you can import it coping this JSON file content replacing the <YOUR_ACCOUNT_ID>
placeholder by your Account ID and then importing it into New Relic using the UI (Dashboard > Import dashboard
) option.