Skip to content

[TT-10496] extra docs for gRPC high availability #6462

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

andrei-tyk
Copy link
Contributor

@andrei-tyk andrei-tyk commented May 26, 2025

User description

Added extra docs around gRPC high availability and DNS protocol for gRPC.

Preview Link

Checklist

  • Added a preview link
  • Reviewed AI PR Agent suggestions
  • For Tyk Members - Added Jira DX PR ticket to the subject
  • For Tyk Members - Added the appropriate release labels (for fixes add the latest release)

New Contributors


PR Type

Documentation


Description

  • Added detailed documentation on highly available gRPC servers in Kubernetes

  • Provided Go example for health checks and graceful shutdown

  • Included Kubernetes deployment YAML with probes for gRPC middleware

  • Documented DNS-based load balancing for gRPC in Tyk 5.8.2+


Changes walkthrough 📝

Relevant files
Documentation
advance-config.md
Add gRPC high availability and DNS protocol documentation

tyk-docs/content/api-management/plugins/advance-config.md

  • Added a comprehensive section on deploying highly available gRPC
    servers in Kubernetes
  • Provided Go code example for readiness/liveness probes and graceful
    shutdown
  • Included example Kubernetes deployment and service YAML with health
    probes
  • Explained DNS-based load balancing for gRPC servers in Tyk 5.8.2+
  • +232/-0 

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • Copy link
    Contributor

    ⚠️ Deploy preview for PR #6462 did not become live after 3 attempts.
    Please check Netlify or try manually: Preview URL

    Copy link
    Contributor

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Documentation Accuracy

    Ensure that the provided Go code example for health checks and graceful shutdown is accurate, idiomatic, and aligns with best practices for production gRPC servers in Kubernetes environments.

    ```go
    package main
    
    import (
    	"context"
    	"log"
    	"net"
    	"net/http"
    	"os"
    	"os/signal"
    	"sync/atomic"
    	"syscall"
    	"time"
    
    	"github.com/TykTechnologies/tyk/coprocess"
    	"google.golang.org/grpc"
    )
    
    const (
    	ListenAddress = ":50051"
    	HealthAddress = ":8080"
    )
    
    func main() {
    	// Track server readiness
    	var isReady int32
    
    	// Start HTTP server for health checks
    	go func() {
    		mux := http.NewServeMux()
    
    		// Readiness probe endpoint
    		mux.HandleFunc("/ready", func(w http.ResponseWriter, r *http.Request) {
    			if atomic.LoadInt32(&isReady) == 1 {
    				w.WriteHeader(http.StatusOK)
    				w.Write([]byte("Ready"))
    			} else {
    				w.WriteHeader(http.StatusServiceUnavailable)
    				w.Write([]byte("Not ready"))
    			}
    		})
    
    		// Liveness probe endpoint
    		mux.HandleFunc("/health", func(w http.ResponseWriter, r *http.Request) {
    			w.WriteHeader(http.StatusOK)
    			w.Write([]byte("Healthy"))
    		})
    
    		if err := http.ListenAndServe(HealthAddress, mux); err != nil {
    			log.Fatalf("Failed to start health server: %v", err)
    		}
    	}()
    
    	lis, err := net.Listen("tcp", ListenAddress)
    	if err != nil {
    		log.Fatalf("Failed to listen: %v", err)
    	}
    
    	log.Printf("starting grpc server on %v", ListenAddress)
    	s := grpc.NewServer()
    	coprocess.RegisterDispatcherServer(s, &Dispatcher{})
    
    	// Channel to listen for errors coming from the listener.
    	serverErrors := make(chan error, 1)
    
    	// Start the service listening for requests.
    	go func() {
    		// Mark as ready once server is initialized
    		atomic.StoreInt32(&isReady, 1)
    		log.Printf("gRPC server is ready to accept connections")
    		serverErrors <- s.Serve(lis)
    	}()
    
    	// Channel to listen for an interrupt or terminate signal from the OS.
    	shutdown := make(chan os.Signal, 1)
    	signal.Notify(shutdown, os.Interrupt, syscall.SIGTERM)
    
    	// Blocking main and waiting for shutdown.
    	select {
    	case err := <-serverErrors:
    		atomic.StoreInt32(&isReady, 0)
    		log.Fatalf("Server error: %v", err)
    	case sig := <-shutdown:
    		atomic.StoreInt32(&isReady, 0)
    		log.Printf("Received signal: %v", sig)
    		// Give outstanding requests 5 seconds to complete.
    		ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
    		defer cancel()
    
    		stopped := make(chan struct{})
    		go func() {
    			s.GracefulStop()
    			close(stopped)
    		}()
    
    		select {
    		case <-ctx.Done():
    			log.Printf("Graceful shutdown timed out")
    			s.Stop()
    		case <-stopped:
    			log.Printf("Graceful shutdown completed")
    		}
    	}
    }
    Kubernetes YAML Validity

    Validate that the Kubernetes deployment and service YAML manifests are correct, follow best practices, and are suitable for real-world use, including proper probe configuration and resource naming.

    ```yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: tyk-grpc-coprocess
      labels:
        app: tyk-grpc-coprocess
    spec:
      replicas: 3
      strategy:
        type: RollingUpdate
        rollingUpdate:
          maxUnavailable: 1
          maxSurge: 1
      selector:
        matchLabels:
          app: tyk-grpc-coprocess
      template:
        metadata:
          labels:
            app: tyk-grpc-coprocess
        spec:
          containers:
          - name: grpc-server
            image: your-registry/tyk-grpc-coprocess:latest
            ports:
            - containerPort: 50051
              name: grpc
            - containerPort: 8080
              name: http-health
    
            # Readiness probe - determines when pod is ready for traffic
            readinessProbe:
              httpGet:
                path: /ready
                port: 8080
              initialDelaySeconds: 5
              periodSeconds: 10
              timeoutSeconds: 5
              failureThreshold: 3
              successThreshold: 1
    
            # Liveness probe - determines when to restart pod
            livenessProbe:
              httpGet:
                path: /health
                port: 8080
              initialDelaySeconds: 15
              periodSeconds: 20
              timeoutSeconds: 5
              failureThreshold: 3
    
            # Startup probe - gives more time for initial startup
            startupProbe:
              httpGet:
                path: /health
                port: 8080
              initialDelaySeconds: 10
              periodSeconds: 5
              timeoutSeconds: 5
              failureThreshold: 30
    
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: tyk-grpc-coprocess-service
    spec:
      selector:
        app: tyk-grpc-coprocess
      ports:
      - name: grpc
        port: 50051
        targetPort: 50051
        protocol: TCP
      - name: http-health
        port: 8080
        targetPort: 8080
        protocol: TCP
      type: ClusterIP
    
    </details>
    
    <details><summary><a href='https://github.com/TykTechnologies/tyk-docs/pull/6462/files#diff-517d04be047e137f2e1aebfe8f230d24fb8d2abfc76d0fd5b7dd382a2b008f09R609-R611'><strong>DNS Protocol Usage Clarity</strong></a>
    
    Confirm that the documentation for using the dns:/// protocol with gRPC and the associated environment variable is clear, accurate, and does not introduce ambiguity for users configuring Tyk 5.8.2+.
    </summary>
    
    ```markdown
    Starting with Tyk 5.8.2, you can also put your gRPC servers behind a load balancer and use the dns:/// (yes, triple slash) 
    protocol in the ENV VAR "TYK_GW_COPROCESSOPTIONS_COPROCESSGRPCSERVER" in order to evenly distribute traffic. 
    For more info please visit https://github.com/grpc/grpc/blob/master/doc/naming.md.
    

    Copy link
    Contributor

    PR Code Suggestions ✨

    No code suggestions found for the PR.

    Copy link

    netlify bot commented May 26, 2025

    PS. Add to the end of url /docs/nightly

    Name Link
    🔨 Latest commit 3dd1378
    🔍 Latest deploy log https://app.netlify.com/projects/tyk-docs/deploys/68429a26ee42a70008fbcbd0
    😎 Deploy Preview https://deploy-preview-6462--tyk-docs.netlify.app
    📱 Preview on mobile
    Toggle QR Code...

    QR Code

    Use your smartphone camera to open QR code link.

    To edit notification comments on pull requests, go to your Netlify project configuration.

    Copy link
    Contributor

    @sharadregoti sharadregoti left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Questions:

    1. Does our gRPC client enable health checking, as shown in this example?

    Comment on lines +426 to +450
    // Start HTTP server for health checks
    go func() {
    mux := http.NewServeMux()

    // Readiness probe endpoint
    mux.HandleFunc("/ready", func(w http.ResponseWriter, r *http.Request) {
    if atomic.LoadInt32(&isReady) == 1 {
    w.WriteHeader(http.StatusOK)
    w.Write([]byte("Ready"))
    } else {
    w.WriteHeader(http.StatusServiceUnavailable)
    w.Write([]byte("Not ready"))
    }
    })

    // Liveness probe endpoint
    mux.HandleFunc("/health", func(w http.ResponseWriter, r *http.Request) {
    w.WriteHeader(http.StatusOK)
    w.Write([]byte("Healthy"))
    })

    if err := http.ListenAndServe(HealthAddress, mux); err != nil {
    log.Fatalf("Failed to start health server: %v", err)
    }
    }()
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    @andrei-tyk

    There is a standard way of doing a health check in gRPC, here is a golang example. Do you think we can use this instead?

    Also, Kubernetes supports gRPC probes, so we can also modify the below Kubernetes manifest.

    Copy link
    Contributor Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    I agree that would be the most performant way but http is easier to debug. Maybe I could add another paragraph saying that for maximum performance we recommend grpc probes ?

    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    No, let's remove HTTP and stick to the native gRPC example

    @andrei-tyk
    Copy link
    Contributor Author

    Questions:

    1. Does our gRPC client enable health checking, as shown in this example?

    No, currently we don't support that but I guess we could in the future.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    3 participants