-
Notifications
You must be signed in to change notification settings - Fork 223
Description
Terraform Version
v1.7.3
Provider Version
v3.30.4
Description
We encounter the following error fairly regularly when running terraform from GitHub Actions. Re-running the github action usually resolves the context canceled error.
Error: Get "https://api.pagerduty.com/abilities": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
We also encounter another error that seems similar/related where re-running the action resolves the problem
Error: Get "https://api.pagerduty.com/abilities": write tcp 10.1.0.132:41842->44.237.102.140:443: write: connection reset by peerAs far as we can tell, this only happens on the /abilities endpoint in the PagerDuty API.
Terraform Configuration
provider "pagerduty" {
alias = "primary"
use_app_oauth_scoped_token {
pd_subdomain = "<our-domain>"
}
}
data "pagerduty_escalation_policy" "policy" {
provider = pagerduty.primary
name = local.team
}
resource "pagerduty_service" "service" {
provider = pagerduty.primary
name = local.service
description = "The platform ${local.service_name} technical service."
auto_resolve_timeout = "null"
acknowledgement_timeout = "null"
escalation_policy = data.pagerduty_escalation_policy.policy.id
auto_pause_notifications_parameters {
enabled = true
timeout = 300
}
incident_urgency_rule {
type = "use_support_hours"
during_support_hours {
type = "constant"
urgency = "high"
}
outside_support_hours {
type = "constant"
urgency = "low"
}
}
support_hours {
type = "fixed_time_per_day"
time_zone = "America/Chicago"
start_time = "07:00:00"
end_time = "19:00:00"
days_of_week = [1, 2, 3, 4, 5]
}
scheduled_actions {
type = "urgency_change"
to_urgency = "high"
at {
type = "named_time"
name = "support_hours_start"
}
}
}
Additional Information
We have noticed this mainly happens in GitHub action builds that use a matrix. So we are accessing the /abilities api concurrently with upwards of 10+ terraform apply actions.
We have also noticed that this error occurs after less than 2 minutes of the build running. So it is unclear if this retry logic is triggered.
Suggestion
I would expect that retry logic to solve our issues since re-running the build resolves it as well.