Modernize godir: Go 1.25, CI/CD, health endpoint#232
Conversation
- Upgrade to Go 1.25 and switch to ntppool/common logger - Add /health endpoint for Kubernetes probes - Fix race condition with proper mutex locking - Add 30-second cooldown for XML file modification checks - Serve stale data on reload errors instead of crashing - Add smoke tests covering all HTTP status codes - Add GitHub Actions workflow for CI/CD with ghcr.io - Update Dockerfile with OCI labels and ENTRYPOINT - Add test entry to perl_mongers.xml for 421 status test
There was a problem hiding this comment.
Pull request overview
This PR modernizes the godir service by upgrading Go to 1.25, adding health checks for Kubernetes, implementing mutex-based concurrency protection, and establishing a comprehensive CI/CD pipeline with GitHub Actions.
Key Changes:
- Upgraded Go from 1.22.5 to 1.25 with updated dependencies (replaced slog-http with go.ntppool.org/common/logger)
- Added /health endpoint and implemented mutex-protected file reloading with 30-second cooldown
- Added comprehensive CI/CD pipeline with automated testing, multi-platform container builds, and deployment to ghcr.io
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| godir/go.mod | Updated Go version to 1.25 and replaced slog-http dependency with go.ntppool.org/common/logger |
| godir/go.sum | Updated dependency checksums for new and upgraded packages |
| godir/main.go | Added health endpoint, mutex-based concurrency protection, cooldown mechanism for file checks, improved error handling, and structured logging |
| godir/main_test.go | Added comprehensive smoke tests for health, redirects, 404, 410, 421, and 400 responses |
| godir/Dockerfile | Updated to Go 1.25, added OCI labels, enforced go.sum, changed CMD to ENTRYPOINT |
| godir/Makefile | Updated container registry to ghcr.io/perlorg/godir and uncommented go.sum dependency |
| godir/README.md | Added build status badge, container image info, environment variables table, endpoints documentation, and usage examples |
| .github/workflows/godir.yml | Added comprehensive CI/CD workflow with test, build, container-test, preview build, and cleanup jobs |
| perl_mongers.xml | Added test entry for smoke testing misdirected request handling |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
perl_mongers.xml
Outdated
| <!-- Test entry for godir smoke tests - DO NOT USE FOR REAL --> | ||
| <group id="9999" status="active"> | ||
| <name>zztestloop.pm</name> | ||
| <location> | ||
| <city>Test</city> | ||
| <region></region> | ||
| <country>Test</country> | ||
| <continent>Test</continent> | ||
| </location> | ||
| <web>https://zztestloop.pm.org/</web> | ||
| </group> |
There was a problem hiding this comment.
The test entry has a web URL pointing to https://zztestloop.pm.org/ which matches the isLocalRegexp pattern (pm.org URL). This means the TestHandlerMisdirectedRequest test will correctly trigger the 421 status code. However, the comment says "DO NOT USE FOR REAL" but doesn't clarify that this entry is specifically designed to test the misdirected request case, which could confuse future maintainers.
| return nil | ||
| } | ||
|
|
||
| func (pmr *PerlMongersReloading) Group(id string) (*Group, error) { |
There was a problem hiding this comment.
Using a full write lock (Lock) for read operations will serialize all HTTP requests, creating a performance bottleneck. Since the Group method primarily performs read operations on pmr.pm, it should use RLock for the read path and only acquire a full Lock when actually modifying data during reload. Consider using RLock initially, then upgrading to Lock only when maybeReload determines a reload is necessary.
| func (pmr *PerlMongersReloading) Group(id string) (*Group, error) { | |
| func (pmr *PerlMongersReloading) Group(id string) (*Group, error) { | |
| // Fast path: use read lock for concurrent reads. | |
| pmr.mu.RLock() | |
| pm := pmr.pm | |
| lastUpdate := pmr.lastUpdate | |
| lastCheck := pmr.lastCheck | |
| path := pmr.path | |
| pmr.mu.RUnlock() | |
| now := time.Now() | |
| needReload := false | |
| // If we have never loaded data, we must reload. | |
| if pm == nil { | |
| needReload = true | |
| } else if now.Sub(lastCheck) >= reloadCooldown { | |
| // Cooldown elapsed: check if the file on disk is newer. | |
| fi, err := os.Stat(path) | |
| if err != nil { | |
| slog.Warn("failed to stat perl_mongers.xml, serving cached data", "error", err) | |
| } else if fi.ModTime().After(lastUpdate) { | |
| needReload = true | |
| } | |
| } | |
| if !needReload { | |
| if pm == nil { | |
| return nil, fmt.Errorf("perl_mongers.xml not loaded") | |
| } | |
| return pm.Group(id) | |
| } | |
| // Reload path: take write lock and call maybeReload, which expects pmr.mu to be held. |
godir/main.go
Outdated
| } | ||
|
|
||
| // reload updates the perl mongers data from disk. | ||
| // Must be called with pmr.mu held. |
There was a problem hiding this comment.
The comment states "Must be called with pmr.mu held" but this function doesn't verify that the lock is held, and it's called from Group which acquires the lock. However, the reload function doesn't need any lock protection since it only modifies local variables and the final assignment to pmr.pm is done after all operations complete. The lock should be held by the caller (maybeReload), not required here. This comment is misleading about the locking contract.
| // Must be called with pmr.mu held. | |
| // Callers are responsible for holding pmr.mu if needed; in practice this is invoked by maybeReload, which is called with pmr.mu held. |
| // Initial load | ||
| if err := pmr.reload(); err != nil { | ||
| log.Error("failed to load perl_mongers.xml", "error", err) | ||
| os.Exit(1) | ||
| } |
There was a problem hiding this comment.
The initial reload is called without holding the mutex, creating a potential race condition. If the server starts handling requests before this initial load completes, or if there's concurrent access during initialization, it could lead to data races. The reload should be called with proper mutex protection, or pmr.mu.Lock() should be acquired before calling reload.
| // Initial load | |
| if err := pmr.reload(); err != nil { | |
| log.Error("failed to load perl_mongers.xml", "error", err) | |
| os.Exit(1) | |
| } | |
| // Initial load (protected by mutex to avoid data races) | |
| pmr.mu.Lock() | |
| if err := pmr.reload(); err != nil { | |
| pmr.mu.Unlock() | |
| log.Error("failed to load perl_mongers.xml", "error", err) | |
| os.Exit(1) | |
| } | |
| pmr.mu.Unlock() |
| mux := http.NewServeMux() | ||
| mux.Handle("/", &handler{pm: pmr, log: log}) | ||
| mux.HandleFunc("/health", healthHandler) |
There was a problem hiding this comment.
The previous code had panic recovery middleware (sloghttp.Recovery) which has been removed. Without panic recovery middleware, any panic in the handler will crash the entire server. Consider adding panic recovery middleware to prevent server crashes from unexpected panics.
| func (pmr *PerlMongersReloading) Group(id string) (*Group, error) { | ||
| if err := pmr.maybeReload(); err != nil { | ||
| log.Fatal(err) | ||
| pmr.mu.Lock() | ||
| defer pmr.mu.Unlock() | ||
|
|
||
| err := pmr.maybeReload() | ||
| if err != nil { | ||
| slog.Warn("failed to reload perl_mongers.xml, serving stale data", "error", err) | ||
| } | ||
|
|
||
| if pmr.pm == nil { | ||
| return nil, fmt.Errorf("perl_mongers.xml not loaded") | ||
| } | ||
| return pmr.pm.Group(id) | ||
| } |
There was a problem hiding this comment.
The PerlMongersReloading.Group method and the maybeReload functionality (including the cooldown mechanism, reload error handling, and mutex locking) are not covered by any tests. This is critical functionality involving concurrency, file I/O, and error handling that should have dedicated tests to ensure it works correctly under various conditions.
| - name: Test health endpoint | ||
| run: | | ||
| response=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8080/health) | ||
| if [ "$response" != "200" ]; then | ||
| echo "Health check failed with status $response" | ||
| docker logs godir-test | ||
| exit 1 | ||
| fi | ||
| echo "Health check passed" |
There was a problem hiding this comment.
The container test only verifies the health endpoint but doesn't test the actual redirect functionality or error cases. After starting the container, the test should also verify that redirects work correctly using the test entry (zztestloop.pm.org) to ensure the container is functioning properly end-to-end.
- Fix isLocalRegexp to match URLs without trailing slash - Use RLock for concurrent reads, Lock only for reload - Add panic recovery middleware with stack traces - Use structured logger in PerlMongersReloading - Add redirect test to container-test workflow - Add workflow file to path triggers
Summary
Test plan
go test -v ./...)go vetandgolangci-lintpass with no issues