Skip to content

Commit d668a14

Browse files
TwiNCopilot
andauthored
feat(suite): Implement Suites (#1239)
* feat(suite): Implement Suites Fixes #1230 * Update docs * Fix variable alignment * Prevent always-run endpoint from running if a context placeholder fails to resolve in the URL * Return errors when a context placeholder path fails to resolve * Add a couple of unit tests * Add a couple of unit tests * fix(ui): Update group count properly Fixes #1233 * refactor: Pass down entire config instead of several sub-configs * fix: Change default suite interval and timeout * fix: Deprecate disable-monitoring-lock in favor of concurrency * fix: Make sure there are no duplicate keys * Refactor some code * Update watchdog/watchdog.go * Update web/app/src/components/StepDetailsModal.vue Co-authored-by: Copilot <[email protected]> * chore: Remove useless log * fix: Set default concurrency to 3 instead of 5 --------- Co-authored-by: Copilot <[email protected]>
1 parent 10cabb9 commit d668a14

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

74 files changed

+7510
-649
lines changed

.github/codecov.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
ignore:
2-
- "watchdog/watchdog.go"
32
- "storage/store/sql/specific_postgres.go" # Can't test for postgres
3+
- "watchdog/endpoint.go"
4+
- "watchdog/external_endpoint.go"
5+
- "watchdog/suite.go"
6+
- "watchdog/watchdog.go"
47
comment: false
58
coverage:
69
status:

README.md

Lines changed: 125 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ Have any feedback or questions? [Create a discussion](https://github.com/TwiN/ga
4545
- [Configuration](#configuration)
4646
- [Endpoints](#endpoints)
4747
- [External Endpoints](#external-endpoints)
48+
- [Suites (ALPHA)](#suites-alpha)
4849
- [Conditions](#conditions)
4950
- [Placeholders](#placeholders)
5051
- [Functions](#functions)
@@ -122,7 +123,7 @@ Have any feedback or questions? [Create a discussion](https://github.com/TwiN/ga
122123
- [Monitoring an endpoint using STARTTLS](#monitoring-an-endpoint-using-starttls)
123124
- [Monitoring an endpoint using TLS](#monitoring-an-endpoint-using-tls)
124125
- [Monitoring domain expiration](#monitoring-domain-expiration)
125-
- [disable-monitoring-lock](#disable-monitoring-lock)
126+
- [Concurrency](#concurrency)
126127
- [Reloading configuration on the fly](#reloading-configuration-on-the-fly)
127128
- [Endpoint groups](#endpoint-groups)
128129
- [How do I sort by group by default?](#how-do-i-sort-by-group-by-default)
@@ -247,7 +248,8 @@ If you want to test it locally, see [Docker](#docker).
247248
| `endpoints` | [Endpoints configuration](#endpoints). | Required `[]` |
248249
| `external-endpoints` | [External Endpoints configuration](#external-endpoints). | `[]` |
249250
| `security` | [Security configuration](#security). | `{}` |
250-
| `disable-monitoring-lock` | Whether to [disable the monitoring lock](#disable-monitoring-lock). | `false` |
251+
| `concurrency` | Maximum number of endpoints/suites to monitor concurrently. Set to `0` for unlimited. See [Concurrency](#concurrency). | `3` |
252+
| `disable-monitoring-lock` | Whether to [disable the monitoring lock](#disable-monitoring-lock). **Deprecated**: Use `concurrency: 0` instead. | `false` |
251253
| `skip-invalid-config-update` | Whether to ignore invalid configuration update. <br />See [Reloading configuration on the fly](#reloading-configuration-on-the-fly). | `false` |
252254
| `web` | Web configuration. | `{}` |
253255
| `web.address` | Address to listen on. | `0.0.0.0` |
@@ -309,6 +311,8 @@ You can then configure alerts to be triggered when an endpoint is unhealthy once
309311
| `endpoints[].ui.dont-resolve-failed-conditions` | Whether to resolve failed conditions for the UI. | `false` |
310312
| `endpoints[].ui.badge.response-time` | List of response time thresholds. Each time a threshold is reached, the badge has a different color. | `[50, 200, 300, 500, 750]` |
311313
| `endpoints[].extra-labels` | Extra labels to add to the metrics. Useful for grouping endpoints together. | `{}` |
314+
| `endpoints[].always-run` | (SUITES ONLY) Whether to execute this endpoint even if previous endpoints in the suite failed. | `false` |
315+
| `endpoints[].store` | (SUITES ONLY) Map of values to extract from the response and store in the suite context (stored even on failure). | `{}` |
312316

313317
You may use the following placeholders in the body (`endpoints[].body`):
314318
- `[ENDPOINT_NAME]` (resolved from `endpoints[].name`)
@@ -366,6 +370,99 @@ Where:
366370
You must also pass the token as a `Bearer` token in the `Authorization` header.
367371

368372

373+
### Suites (ALPHA)
374+
Suites are collections of endpoints that are executed sequentially with a shared context.
375+
This allows you to create complex monitoring scenarios where the result from one endpoint can be used in subsequent endpoints, enabling workflow-style monitoring.
376+
377+
Here are a few cases in which suites could be useful:
378+
- Testing multi-step authentication flows (login -> access protected resource -> logout)
379+
- API workflows where you need to chain requests (create resource -> update -> verify -> delete)
380+
- Monitoring business processes that span multiple services
381+
- Validating data consistency across multiple endpoints
382+
383+
| Parameter | Description | Default |
384+
|:----------------------------------|:----------------------------------------------------------------------------------------------------|:--------------|
385+
| `suites` | List of suites to monitor. | `[]` |
386+
| `suites[].enabled` | Whether to monitor the suite. | `true` |
387+
| `suites[].name` | Name of the suite. Must be unique. | Required `""` |
388+
| `suites[].group` | Group name. Used to group multiple suites together on the dashboard. | `""` |
389+
| `suites[].interval` | Duration to wait between suite executions. | `10m` |
390+
| `suites[].timeout` | Maximum duration for the entire suite execution. | `5m` |
391+
| `suites[].context` | Initial context values that can be referenced by endpoints. | `{}` |
392+
| `suites[].endpoints` | List of endpoints to execute sequentially. | Required `[]` |
393+
| `suites[].endpoints[].store` | Map of values to extract from the response and store in the suite context (stored even on failure). | `{}` |
394+
| `suites[].endpoints[].always-run` | Whether to execute this endpoint even if previous endpoints in the suite failed. | `false` |
395+
396+
**Note**: Suite-level alerts are not supported yet. Configure alerts on individual endpoints within the suite instead.
397+
398+
#### Using Context in Endpoints
399+
Once values are stored in the context, they can be referenced in subsequent endpoints:
400+
- In the URL: `https://api.example.com/users/[CONTEXT].userId`
401+
- In headers: `Authorization: Bearer [CONTEXT].authToken`
402+
- In the body: `{"user_id": "[CONTEXT].userId"}`
403+
- In conditions: `[BODY].server_ip == [CONTEXT].serverIp`
404+
405+
#### Example Suite Configuration
406+
```yaml
407+
suites:
408+
- name: item-crud-workflow
409+
group: api-tests
410+
interval: 5m
411+
context:
412+
price: "19.99" # Initial static value in context
413+
endpoints:
414+
# Step 1: Create an item and store the item ID
415+
- name: create-item
416+
url: https://api.example.com/items
417+
method: POST
418+
body: '{"name": "Test Item", "price": "[CONTEXT].price"}'
419+
conditions:
420+
- "[STATUS] == 201"
421+
- "len([BODY].id) > 0"
422+
- "[BODY].price == [CONTEXT].price"
423+
store:
424+
itemId: "[BODY].id"
425+
alerts:
426+
- type: slack
427+
description: "Failed to create item"
428+
429+
# Step 2: Update the item using the stored item ID
430+
- name: update-item
431+
url: https://api.example.com/items/[CONTEXT].itemId
432+
method: PUT
433+
body: '{"price": "24.99"}'
434+
conditions:
435+
- "[STATUS] == 200"
436+
alerts:
437+
- type: slack
438+
description: "Failed to update item"
439+
440+
# Step 3: Fetch the item and validate the price
441+
- name: get-item
442+
url: https://api.example.com/items/[CONTEXT].itemId
443+
method: GET
444+
conditions:
445+
- "[STATUS] == 200"
446+
- "[BODY].price == 24.99"
447+
alerts:
448+
- type: slack
449+
description: "Item price did not update correctly"
450+
451+
# Step 4: Delete the item (always-run: true to ensure cleanup even if step 2 or 3 fails)
452+
- name: delete-item
453+
url: https://api.example.com/items/[CONTEXT].itemId
454+
method: DELETE
455+
always-run: true
456+
conditions:
457+
- "[STATUS] == 204"
458+
alerts:
459+
- type: slack
460+
description: "Failed to delete item"
461+
```
462+
463+
The suite will be considered successful only if all required endpoints pass their conditions.
464+
465+
369466
### Conditions
370467
Here are some examples of conditions you can use:
371468

@@ -2921,17 +3018,34 @@ endpoints:
29213018
> using the `[DOMAIN_EXPIRATION]` placeholder on an endpoint with an interval of less than `5m`.
29223019

29233020

2924-
### disable-monitoring-lock
2925-
Setting `disable-monitoring-lock` to `true` means that multiple endpoints could be monitored at the same time (i.e. parallel execution).
3021+
### Concurrency
3022+
By default, Gatus allows up to 5 endpoints/suites to be monitored concurrently. This provides a balance between performance and resource usage while maintaining accurate response time measurements.
3023+
3024+
You can configure the concurrency level using the `concurrency` parameter:
3025+
3026+
```yaml
3027+
# Allow 10 endpoints/suites to be monitored concurrently
3028+
concurrency: 10
3029+
3030+
# Allow unlimited concurrent monitoring
3031+
concurrency: 0
3032+
3033+
# Use default concurrency (3)
3034+
# concurrency: 3
3035+
```
3036+
3037+
**Important considerations:**
3038+
- Higher concurrency can improve monitoring performance when you have many endpoints
3039+
- Conditions using the `[RESPONSE_TIME]` placeholder may be less accurate with very high concurrency due to system resource contention
3040+
- Set to `0` for unlimited concurrency (equivalent to the deprecated `disable-monitoring-lock: true`)
29263041

2927-
While this behavior wouldn't generally be harmful, conditions using the `[RESPONSE_TIME]` placeholder could be impacted
2928-
by the evaluation of multiple endpoints at the same time, therefore, the default value for this parameter is `false`.
3042+
**Use cases for higher concurrency:**
3043+
- You have a large number of endpoints to monitor
3044+
- You want to monitor endpoints at very short intervals (< 5s)
3045+
- You're using Gatus for load testing scenarios
29293046

2930-
There are three main reasons why you might want to disable the monitoring lock:
2931-
- You're using Gatus for load testing (each endpoint are periodically evaluated on a different goroutine, so
2932-
technically, if you create 100 endpoints with a 1 seconds interval, Gatus will send 100 requests per second)
2933-
- You have a _lot_ of endpoints to monitor
2934-
- You want to test multiple endpoints at very short intervals (< 5s)
3047+
**Legacy configuration:**
3048+
The `disable-monitoring-lock` parameter is deprecated but still supported for backward compatibility. It's equivalent to setting `concurrency: 0`.
29353049

29363050

29373051
### Reloading configuration on the fly

api/api.go

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,8 @@ func (a *API) createRouter(cfg *config.Config) *fiber.App {
8787
unprotectedAPIRouter.Post("/v1/endpoints/:key/external", CreateExternalEndpointResult(cfg))
8888
// SPA
8989
app.Get("/", SinglePageApplication(cfg.UI))
90-
app.Get("/endpoints/:name", SinglePageApplication(cfg.UI))
90+
app.Get("/endpoints/:key", SinglePageApplication(cfg.UI))
91+
app.Get("/suites/:key", SinglePageApplication(cfg.UI))
9192
// Health endpoint
9293
healthHandler := health.Handler().WithJSON(true)
9394
app.Get("/health", func(c *fiber.Ctx) error {
@@ -127,5 +128,7 @@ func (a *API) createRouter(cfg *config.Config) *fiber.App {
127128
}
128129
protectedAPIRouter.Get("/v1/endpoints/statuses", EndpointStatuses(cfg))
129130
protectedAPIRouter.Get("/v1/endpoints/:key/statuses", EndpointStatus(cfg))
131+
protectedAPIRouter.Get("/v1/suites/statuses", SuiteStatuses(cfg))
132+
protectedAPIRouter.Get("/v1/suites/:key/statuses", SuiteStatus(cfg))
130133
return app
131134
}

api/badge_test.go

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,8 @@ func TestBadge(t *testing.T) {
3434
cfg.Endpoints[0].UIConfig = ui.GetDefaultConfig()
3535
cfg.Endpoints[1].UIConfig = ui.GetDefaultConfig()
3636

37-
watchdog.UpdateEndpointStatuses(cfg.Endpoints[0], &endpoint.Result{Success: true, Connected: true, Duration: time.Millisecond, Timestamp: time.Now()})
38-
watchdog.UpdateEndpointStatuses(cfg.Endpoints[1], &endpoint.Result{Success: false, Connected: false, Duration: time.Second, Timestamp: time.Now()})
37+
watchdog.UpdateEndpointStatus(cfg.Endpoints[0], &endpoint.Result{Success: true, Connected: true, Duration: time.Millisecond, Timestamp: time.Now()})
38+
watchdog.UpdateEndpointStatus(cfg.Endpoints[1], &endpoint.Result{Success: false, Connected: false, Duration: time.Second, Timestamp: time.Now()})
3939
api := New(cfg)
4040
router := api.Router()
4141
type Scenario struct {
@@ -284,8 +284,8 @@ func TestGetBadgeColorFromResponseTime(t *testing.T) {
284284
},
285285
}
286286

287-
store.Get().Insert(&firstTestEndpoint, &testSuccessfulResult)
288-
store.Get().Insert(&secondTestEndpoint, &testSuccessfulResult)
287+
store.Get().InsertEndpointResult(&firstTestEndpoint, &testSuccessfulResult)
288+
store.Get().InsertEndpointResult(&secondTestEndpoint, &testSuccessfulResult)
289289

290290
scenarios := []struct {
291291
Key string

api/chart_test.go

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,8 @@ func TestResponseTimeChart(t *testing.T) {
2828
},
2929
},
3030
}
31-
watchdog.UpdateEndpointStatuses(cfg.Endpoints[0], &endpoint.Result{Success: true, Duration: time.Millisecond, Timestamp: time.Now()})
32-
watchdog.UpdateEndpointStatuses(cfg.Endpoints[1], &endpoint.Result{Success: false, Duration: time.Second, Timestamp: time.Now()})
31+
watchdog.UpdateEndpointStatus(cfg.Endpoints[0], &endpoint.Result{Success: true, Duration: time.Millisecond, Timestamp: time.Now()})
32+
watchdog.UpdateEndpointStatus(cfg.Endpoints[1], &endpoint.Result{Success: false, Duration: time.Second, Timestamp: time.Now()})
3333
api := New(cfg)
3434
router := api.Router()
3535
type Scenario struct {

api/endpoint_status_test.go

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -101,8 +101,8 @@ func TestEndpointStatus(t *testing.T) {
101101
MaximumNumberOfEvents: storage.DefaultMaximumNumberOfEvents,
102102
},
103103
}
104-
watchdog.UpdateEndpointStatuses(cfg.Endpoints[0], &endpoint.Result{Success: true, Duration: time.Millisecond, Timestamp: time.Now()})
105-
watchdog.UpdateEndpointStatuses(cfg.Endpoints[1], &endpoint.Result{Success: false, Duration: time.Second, Timestamp: time.Now()})
104+
watchdog.UpdateEndpointStatus(cfg.Endpoints[0], &endpoint.Result{Success: true, Duration: time.Millisecond, Timestamp: time.Now()})
105+
watchdog.UpdateEndpointStatus(cfg.Endpoints[1], &endpoint.Result{Success: false, Duration: time.Second, Timestamp: time.Now()})
106106
api := New(cfg)
107107
router := api.Router()
108108
type Scenario struct {
@@ -156,8 +156,8 @@ func TestEndpointStatuses(t *testing.T) {
156156
defer cache.Clear()
157157
firstResult := &testSuccessfulResult
158158
secondResult := &testUnsuccessfulResult
159-
store.Get().Insert(&testEndpoint, firstResult)
160-
store.Get().Insert(&testEndpoint, secondResult)
159+
store.Get().InsertEndpointResult(&testEndpoint, firstResult)
160+
store.Get().InsertEndpointResult(&testEndpoint, secondResult)
161161
// Can't be bothered dealing with timezone issues on the worker that runs the automated tests
162162
firstResult.Timestamp = time.Time{}
163163
secondResult.Timestamp = time.Time{}

api/external_endpoint.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ func CreateExternalEndpointResult(cfg *config.Config) fiber.Handler {
6060
result.Errors = append(result.Errors, c.Query("error"))
6161
}
6262
convertedEndpoint := externalEndpoint.ToEndpoint()
63-
if err := store.Get().Insert(convertedEndpoint, result); err != nil {
63+
if err := store.Get().InsertEndpointResult(convertedEndpoint, result); err != nil {
6464
if errors.Is(err, common.ErrEndpointNotFound) {
6565
return c.Status(404).SendString(err.Error())
6666
}

api/raw_test.go

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,8 @@ func TestRawDataEndpoint(t *testing.T) {
3333
cfg.Endpoints[0].UIConfig = ui.GetDefaultConfig()
3434
cfg.Endpoints[1].UIConfig = ui.GetDefaultConfig()
3535

36-
watchdog.UpdateEndpointStatuses(cfg.Endpoints[0], &endpoint.Result{Success: true, Connected: true, Duration: time.Millisecond, Timestamp: time.Now()})
37-
watchdog.UpdateEndpointStatuses(cfg.Endpoints[1], &endpoint.Result{Success: false, Connected: false, Duration: time.Second, Timestamp: time.Now()})
36+
watchdog.UpdateEndpointStatus(cfg.Endpoints[0], &endpoint.Result{Success: true, Connected: true, Duration: time.Millisecond, Timestamp: time.Now()})
37+
watchdog.UpdateEndpointStatus(cfg.Endpoints[1], &endpoint.Result{Success: false, Connected: false, Duration: time.Second, Timestamp: time.Now()})
3838
api := New(cfg)
3939
router := api.Router()
4040
type Scenario struct {

api/spa_test.go

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,8 @@ func TestSinglePageApplication(t *testing.T) {
3434
Title: "example-title",
3535
},
3636
}
37-
watchdog.UpdateEndpointStatuses(cfg.Endpoints[0], &endpoint.Result{Success: true, Duration: time.Millisecond, Timestamp: time.Now()})
38-
watchdog.UpdateEndpointStatuses(cfg.Endpoints[1], &endpoint.Result{Success: false, Duration: time.Second, Timestamp: time.Now()})
37+
watchdog.UpdateEndpointStatus(cfg.Endpoints[0], &endpoint.Result{Success: true, Duration: time.Millisecond, Timestamp: time.Now()})
38+
watchdog.UpdateEndpointStatus(cfg.Endpoints[1], &endpoint.Result{Success: false, Duration: time.Second, Timestamp: time.Now()})
3939
api := New(cfg)
4040
router := api.Router()
4141
type Scenario struct {

api/suite_status.go

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
package api
2+
3+
import (
4+
"fmt"
5+
6+
"github.com/TwiN/gatus/v5/config"
7+
"github.com/TwiN/gatus/v5/config/suite"
8+
"github.com/TwiN/gatus/v5/storage/store"
9+
"github.com/TwiN/gatus/v5/storage/store/common/paging"
10+
"github.com/gofiber/fiber/v2"
11+
)
12+
13+
// SuiteStatuses handles requests to retrieve all suite statuses
14+
func SuiteStatuses(cfg *config.Config) fiber.Handler {
15+
return func(c *fiber.Ctx) error {
16+
page, pageSize := extractPageAndPageSizeFromRequest(c, 100)
17+
params := paging.NewSuiteStatusParams().WithPagination(page, pageSize)
18+
suiteStatuses, err := store.Get().GetAllSuiteStatuses(params)
19+
if err != nil {
20+
return c.Status(fiber.StatusInternalServerError).JSON(fiber.Map{
21+
"error": fmt.Sprintf("Failed to retrieve suite statuses: %v", err),
22+
})
23+
}
24+
// If no statuses exist yet, create empty ones from config
25+
if len(suiteStatuses) == 0 {
26+
for _, s := range cfg.Suites {
27+
if s.IsEnabled() {
28+
suiteStatuses = append(suiteStatuses, suite.NewStatus(s))
29+
}
30+
}
31+
}
32+
return c.Status(fiber.StatusOK).JSON(suiteStatuses)
33+
}
34+
}
35+
36+
// SuiteStatus handles requests to retrieve a single suite's status
37+
func SuiteStatus(cfg *config.Config) fiber.Handler {
38+
return func(c *fiber.Ctx) error {
39+
page, pageSize := extractPageAndPageSizeFromRequest(c, 100)
40+
key := c.Params("key")
41+
params := paging.NewSuiteStatusParams().WithPagination(page, pageSize)
42+
status, err := store.Get().GetSuiteStatusByKey(key, params)
43+
if err != nil || status == nil {
44+
// Try to find the suite in config
45+
for _, s := range cfg.Suites {
46+
if s.Key() == key {
47+
status = suite.NewStatus(s)
48+
break
49+
}
50+
}
51+
if status == nil {
52+
return c.Status(404).JSON(fiber.Map{
53+
"error": fmt.Sprintf("Suite with key '%s' not found", key),
54+
})
55+
}
56+
}
57+
return c.Status(fiber.StatusOK).JSON(status)
58+
}
59+
}

0 commit comments

Comments
 (0)