Skip to content

Modernize godir: Go 1.25, CI/CD, health endpoint#232

Merged
rspier merged 2 commits intomasterfrom
godir-modernization
Jan 10, 2026
Merged

Modernize godir: Go 1.25, CI/CD, health endpoint#232
rspier merged 2 commits intomasterfrom
godir-modernization

Conversation

@abh
Copy link
Member

@abh abh commented Jan 7, 2026

Summary

  • Upgrade godir from Go 1.22.5 to Go 1.25
  • Replace slog-http with go.ntppool.org/common/logger for structured logging
  • Add /health endpoint returning 200 OK for Kubernetes liveness/readiness probes
  • Fix race condition in PerlMongersReloading with proper mutex locking
  • Add 30-second cooldown for XML file modification checks to reduce stat() calls
  • Serve stale data on reload errors instead of crashing the server
  • Add GitHub Actions workflow for automated testing and container builds to ghcr.io/perlorg/godir
  • Update Dockerfile: Go 1.25, OCI labels, ENTRYPOINT, enforce go.sum
  • Add smoke tests covering health, redirects, 404, 410, 421, and 400 responses

Test plan

  • All tests pass locally (go test -v ./...)
  • go vet and golangci-lint pass with no issues
  • Docker image builds successfully
  • Container starts and /health returns 200
  • Redirect functionality verified with curl
  • GitHub Actions workflow runs successfully on PR

- Upgrade to Go 1.25 and switch to ntppool/common logger
- Add /health endpoint for Kubernetes probes
- Fix race condition with proper mutex locking
- Add 30-second cooldown for XML file modification checks
- Serve stale data on reload errors instead of crashing
- Add smoke tests covering all HTTP status codes
- Add GitHub Actions workflow for CI/CD with ghcr.io
- Update Dockerfile with OCI labels and ENTRYPOINT
- Add test entry to perl_mongers.xml for 421 status test
Copilot AI review requested due to automatic review settings January 7, 2026 00:56
@abh abh requested a review from rspier January 7, 2026 00:57
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR modernizes the godir service by upgrading Go to 1.25, adding health checks for Kubernetes, implementing mutex-based concurrency protection, and establishing a comprehensive CI/CD pipeline with GitHub Actions.

Key Changes:

  • Upgraded Go from 1.22.5 to 1.25 with updated dependencies (replaced slog-http with go.ntppool.org/common/logger)
  • Added /health endpoint and implemented mutex-protected file reloading with 30-second cooldown
  • Added comprehensive CI/CD pipeline with automated testing, multi-platform container builds, and deployment to ghcr.io

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
godir/go.mod Updated Go version to 1.25 and replaced slog-http dependency with go.ntppool.org/common/logger
godir/go.sum Updated dependency checksums for new and upgraded packages
godir/main.go Added health endpoint, mutex-based concurrency protection, cooldown mechanism for file checks, improved error handling, and structured logging
godir/main_test.go Added comprehensive smoke tests for health, redirects, 404, 410, 421, and 400 responses
godir/Dockerfile Updated to Go 1.25, added OCI labels, enforced go.sum, changed CMD to ENTRYPOINT
godir/Makefile Updated container registry to ghcr.io/perlorg/godir and uncommented go.sum dependency
godir/README.md Added build status badge, container image info, environment variables table, endpoints documentation, and usage examples
.github/workflows/godir.yml Added comprehensive CI/CD workflow with test, build, container-test, preview build, and cleanup jobs
perl_mongers.xml Added test entry for smoke testing misdirected request handling

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

perl_mongers.xml Outdated
Comment on lines +19168 to +19178
<!-- Test entry for godir smoke tests - DO NOT USE FOR REAL -->
<group id="9999" status="active">
<name>zztestloop.pm</name>
<location>
<city>Test</city>
<region></region>
<country>Test</country>
<continent>Test</continent>
</location>
<web>https://zztestloop.pm.org/</web>
</group>
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test entry has a web URL pointing to https://zztestloop.pm.org/ which matches the isLocalRegexp pattern (pm.org URL). This means the TestHandlerMisdirectedRequest test will correctly trigger the 421 status code. However, the comment says "DO NOT USE FOR REAL" but doesn't clarify that this entry is specifically designed to test the misdirected request case, which could confuse future maintainers.

Copilot uses AI. Check for mistakes.
return nil
}

func (pmr *PerlMongersReloading) Group(id string) (*Group, error) {
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a full write lock (Lock) for read operations will serialize all HTTP requests, creating a performance bottleneck. Since the Group method primarily performs read operations on pmr.pm, it should use RLock for the read path and only acquire a full Lock when actually modifying data during reload. Consider using RLock initially, then upgrading to Lock only when maybeReload determines a reload is necessary.

Suggested change
func (pmr *PerlMongersReloading) Group(id string) (*Group, error) {
func (pmr *PerlMongersReloading) Group(id string) (*Group, error) {
// Fast path: use read lock for concurrent reads.
pmr.mu.RLock()
pm := pmr.pm
lastUpdate := pmr.lastUpdate
lastCheck := pmr.lastCheck
path := pmr.path
pmr.mu.RUnlock()
now := time.Now()
needReload := false
// If we have never loaded data, we must reload.
if pm == nil {
needReload = true
} else if now.Sub(lastCheck) >= reloadCooldown {
// Cooldown elapsed: check if the file on disk is newer.
fi, err := os.Stat(path)
if err != nil {
slog.Warn("failed to stat perl_mongers.xml, serving cached data", "error", err)
} else if fi.ModTime().After(lastUpdate) {
needReload = true
}
}
if !needReload {
if pm == nil {
return nil, fmt.Errorf("perl_mongers.xml not loaded")
}
return pm.Group(id)
}
// Reload path: take write lock and call maybeReload, which expects pmr.mu to be held.

Copilot uses AI. Check for mistakes.
godir/main.go Outdated
}

// reload updates the perl mongers data from disk.
// Must be called with pmr.mu held.
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states "Must be called with pmr.mu held" but this function doesn't verify that the lock is held, and it's called from Group which acquires the lock. However, the reload function doesn't need any lock protection since it only modifies local variables and the final assignment to pmr.pm is done after all operations complete. The lock should be held by the caller (maybeReload), not required here. This comment is misleading about the locking contract.

Suggested change
// Must be called with pmr.mu held.
// Callers are responsible for holding pmr.mu if needed; in practice this is invoked by maybeReload, which is called with pmr.mu held.

Copilot uses AI. Check for mistakes.
Comment on lines +38 to 42
// Initial load
if err := pmr.reload(); err != nil {
log.Error("failed to load perl_mongers.xml", "error", err)
os.Exit(1)
}
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial reload is called without holding the mutex, creating a potential race condition. If the server starts handling requests before this initial load completes, or if there's concurrent access during initialization, it could lead to data races. The reload should be called with proper mutex protection, or pmr.mu.Lock() should be acquired before calling reload.

Suggested change
// Initial load
if err := pmr.reload(); err != nil {
log.Error("failed to load perl_mongers.xml", "error", err)
os.Exit(1)
}
// Initial load (protected by mutex to avoid data races)
pmr.mu.Lock()
if err := pmr.reload(); err != nil {
pmr.mu.Unlock()
log.Error("failed to load perl_mongers.xml", "error", err)
os.Exit(1)
}
pmr.mu.Unlock()

Copilot uses AI. Check for mistakes.
Comment on lines +44 to +46
mux := http.NewServeMux()
mux.Handle("/", &handler{pm: pmr, log: log})
mux.HandleFunc("/health", healthHandler)
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous code had panic recovery middleware (sloghttp.Recovery) which has been removed. Without panic recovery middleware, any panic in the handler will crash the entire server. Consider adding panic recovery middleware to prevent server crashes from unexpected panics.

Copilot uses AI. Check for mistakes.
Comment on lines 235 to 248
func (pmr *PerlMongersReloading) Group(id string) (*Group, error) {
if err := pmr.maybeReload(); err != nil {
log.Fatal(err)
pmr.mu.Lock()
defer pmr.mu.Unlock()

err := pmr.maybeReload()
if err != nil {
slog.Warn("failed to reload perl_mongers.xml, serving stale data", "error", err)
}

if pmr.pm == nil {
return nil, fmt.Errorf("perl_mongers.xml not loaded")
}
return pmr.pm.Group(id)
}
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PerlMongersReloading.Group method and the maybeReload functionality (including the cooldown mechanism, reload error handling, and mutex locking) are not covered by any tests. This is critical functionality involving concurrency, file I/O, and error handling that should have dedicated tests to ensure it works correctly under various conditions.

Copilot uses AI. Check for mistakes.
Comment on lines +113 to +121
- name: Test health endpoint
run: |
response=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8080/health)
if [ "$response" != "200" ]; then
echo "Health check failed with status $response"
docker logs godir-test
exit 1
fi
echo "Health check passed"
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The container test only verifies the health endpoint but doesn't test the actual redirect functionality or error cases. After starting the container, the test should also verify that redirects work correctly using the test entry (zztestloop.pm.org) to ensure the container is functioning properly end-to-end.

Copilot uses AI. Check for mistakes.
- Fix isLocalRegexp to match URLs without trailing slash
- Use RLock for concurrent reads, Lock only for reload
- Add panic recovery middleware with stack traces
- Use structured logger in PerlMongersReloading
- Add redirect test to container-test workflow
- Add workflow file to path triggers
@rspier rspier merged commit 75fc573 into master Jan 10, 2026
5 checks passed
@rspier rspier deleted the godir-modernization branch January 10, 2026 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants