Download logs from DataDog via the API and save them to your PC.
This tool originated from an existent GitHub Project: datadog-downloader by WeGift, a script that allows you to download a large number of logs matching a particular query rather than being bound by the limit of 5000 imposed on the export button in the UI.
From this base, a complete tool has been developed to automate the process, giving the user the largest flexibility in terms of input and output.
Logs are output as a file, and the format is one of the parameters (the choice is between JSON
and CSV
).
Fields that will be used to create the result file are fully customizable, they are one of the parameters, mandatory of course, see forward for details.
The downloader is a container that exposes two endpoints:
POST http://[host]:[port]/export
for sending a download request, parameters will be passed via theJSON
payload, if nothing goes wrong it returns the URL to call in order to download the file.
The input JSON must follow these rules:
Parameter | Mandatory | Description | Allowed values | Default Value |
---|---|---|---|---|
query | YES | The DataDog filter query. Take care when quoting on the command line, single quote the entire query for best results. | datadog filter string |
none |
columns | YES | A list of DataDog logs attributes to include in the export, with label to assign in the export file and a default value | JSON object |
none |
from | NO | Start date/time | string date |
1 day ago |
to | NO | End date/time | string date |
current timestamp |
pageSize | NO | How many results to download at a time, default 1000 limit of 5000 | [1..5000] |
1000 |
outputFormat | NO | Format of file to write results to | ['JSON','CSV'] |
JSON |
outputFile | NO | Path of file to write results to (the extension is related to outputFormat) | file name (without extension) |
results |
verbose | NO | Flag for having tool's activities logged on console | true|false |
false |
Note: Date/times are parsed by JS Date
constructor, e.g. 2022-01-01.
The result is an HTTP 202
status code, the body contains a link for downloading the file, this link has a validity of 36 hours, then the file will be deleted.
GET http://[host]:[port]/{result_file}
for getting back the exported log file.
If the export is still in progress, you will get yet a HTTP 202
status, if an error occurred during the export phase an HTTP 400
will be returned, otherwise if result_file
exists and it was created within the previous 36 hours, it will be downloaded, otherwise, you will get an HTTP 404
error.
The user requests the export, the tool launches a separate process to export and create a file, and responds immediately to the user with the link, the HTTP STATUS sent is 202
. In the bucket is created immediately a sentinel file
to represent export was started and is in progress, if any error occurred the sentinel file will be deleted and an error file
will be created.
Both sentinel
and error
files have the same name as the export
one, with the suffixes .sem
and .ERR
respectively.
The endpoint GET /{filename}
will check the presence of these files, in precise order:
{filename}.sem
: if the sentinel is present, the export is still in progress, aHTTP 202
is returned, wait and retry later.{filename}.ERR
: an error occurred, aHTTP 400
is returned.{filename}
: the sentinel was deleted, the error file is not present, so the export file may be present. If all goes right, aHTTP 200
will be returned, and the file will be send to the client.
Once created, a report file will be stored for the next 36 hours in a Cloud Storage Bucket, according to container's environment configuration.
The allowed Cloud storage providers actually are Google
and AWS
Which logs data put in the export file?
The list of attributes of a Datadog log is very large and it depends on which logs channel we wants to analyze and what we wants to analyze, so the best answer to the question is it depends!
.
You have the power, you can choice what you want to have in the file.
The log's timestamp is the only attribute that is in all results, as first information.
You are free to add whatever you want, using the columns
attribute of the JSON payload of the command input; it's possible to define the label
(how the information will be named in the file) and its path
inside the Datadog logs, and optionally you can set a default value in case the information is not present in a single log entry (an empty string ""
as system default value).
The root of log attributes is attributes
, so (for example) the timestamp is referred as attributes.timestamp
The columns
input attribute is an array of tuples containing label, attribute's path and optional default value if the attribute isn't found.
The path
must be written as a string, with an hashtag (#
) as separator (come back to timestamp example, it should be added as attributes#timestamp
)
You will need an API key and an app key to access the DataDog api. These should be provided in environment variables as seen above.
API keys are global for a DataDog account and can be found in organization settings. App keys are personal to your profile and can be generated in personal settings.
Variable name | Mandatory | Description | Allowed values | Example value |
---|---|---|---|---|
PORT | YES | TCP Port Server puts itself to listening on | 1 .. 60000 |
8080 |
DD_SITE | YES | DataDog site from which logs will be downloaded | URL | hostname |
datadoghq.com |
DD_API_KEY | YES | DataDog API KEY | API KEY string |
aaabbbcccdddeeefff |
DD_APP_KEY | YES | DataDog Application KEY | APP KEY string |
xxxxyyyyzzzzwwww |
STORAGE_TYPE | YES | Cloud Storage Provider to store result files | ['GOOGLE'\,'AWS'] |
AWS |
GOOGLE_APPLICATION_CREDENTIALS | if STORAGE_TYPE is GOOGLE |
Path of Google's JSON credentials file | path to json file |
/path/to/google_credentials.json |
GOOGLE_BUCKET_NAME | if STORAGE_TYPE is GOOGLE |
Bucket name to store data | string |
datadog_storage |
AWS_ACCESS_KEY_ID | if STORAGE_TYPE is AWS |
Access Key for AWS account | Access Key string |
xxxxaaaazzzzbbbbyyy |
AWS_SECRET_ACCESS_KEY | if STORAGE_TYPE is AWS |
Secret for AWS account | Secret string |
xxxx-aaaaz$zzzbbbbyyy |
AWS_BUCKET_NAME | if STORAGE_TYPE is AWS |
Bucket name to store data | string |
datadog_storage |
AWS_REGION | if STORAGE_TYPE is AWS |
AWS region where bucket is located | AWS region |
eu-north-1 |
Run npm install
.
Copy .env.example
to .env.production
and add a valid data.
node index.js
to start the server (also a npm start
does the job).
Do you want to asking for all HTTP 500 errors on /api/*
calls on 27th March between 12 and 13 GMT, saving a CSV file, with date, context.request.uri, context.request.headers.x-myown-header
as output, with verbose logging to console? Type
curl -H "Content-type: application/json" -X POST "http://localhost:8080/export" \
-d'{"query":"@context.request.uri:\/api\/* @http.status_code:500","verbose":true,"from":"2023-03-27T12:00:00.000Z","to":"2023-03-27T13:00:00.000Z","outputFormat":"csv","columns":[{"label":"URI","path":"attributes#context#request#uri"},{"label":"My Header","path":"attributes#context#request#headers#x-myown-header"}]}'
You will get http://localhost:8080/1680087684088_results.csv
as response, or something similar, the prefix is timestamp of request, used for cleaning archives after the limit of 36 hours.
Calling the URL from a browser will force to download the file, otherwise you can use cUrl (according previous request)
curl http://localhost:8080/1680087684088_results.csv -o "1680087684088_results.csv"
to have the file stored in your computer.
The tool uses vitest
as test suite. Run vitest
to launch the tests, or npm run start-dev
.