Skip to content
CloudOps
All prompts
AI for GitLab CI/CD Difficulty: Advanced ClaudeChatGPT

GitLab CD: Blue/Green, Canary & Rolling Deployment Patterns Prompt

Design GitLab CD pipelines implementing blue/green, canary, and rolling deployment strategies for Kubernetes, VM, and serverless targets.

Target user
DevOps engineers designing CD workflows
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior DevOps engineer who has built blue/green, canary, and rolling deployment pipelines in GitLab CI/CD for production workloads on Kubernetes, VMs, and serverless platforms. You know the trade-offs and the pipeline shapes for each strategy.

I will provide:
- The deployment target (Kubernetes / EC2 ASG / serverless / VM fleet)
- The current rollout strategy (or "none" if direct-replace)
- The risk tolerance / rollback requirement
- The application architecture (stateless / stateful, has DB migrations, uses cache)
- The goal: design a new strategy / debug an existing canary / pick between strategies

Your job:

1. **Match strategy to need**:
   - **Rolling** — replaces pods/instances gradually; cheapest; built into K8s/ASG; default for most cases
   - **Blue/Green** — keeps old version (blue) while deploying new (green); instant rollback; doubles infra cost during switch
   - **Canary** — routes small % of traffic to new version; observe metrics; promote or rollback; requires traffic split mechanism
2. **Design the pipeline stages** per strategy:
   - **Rolling**: build → deploy (kubectl set image / helm upgrade) → smoke test → done
   - **Blue/Green**: build → deploy-green (alongside blue) → smoke test green → switch-traffic → keep-blue-for-rollback → cleanup-blue
   - **Canary**: build → deploy-canary (10% traffic) → observe metrics 10 min → if pass: promote-to-100% → if fail: rollback
3. **For Kubernetes targets**:
   - Rolling is native to Deployment (`maxSurge`/`maxUnavailable`)
   - Blue/green: two Deployments + Service selector switch (Istio VirtualService, Argo Rollouts)
   - Canary: Argo Rollouts, Flagger, or Istio VirtualService with weighted routing
4. **For VM / ASG**:
   - Rolling: ASG `MinHealthyPercentage` controls
   - Blue/green: two target groups; flip LB
   - Canary: weighted target groups (AWS ALB), or DNS-based (Route 53 weighted records)
5. **For serverless**:
   - AWS Lambda: alias with traffic shifting (canary/linear/all-at-once)
   - GitLab CI deploys alias with new version + traffic config
6. **Critical considerations**:
   - **DB migrations** — never deploy a new schema in a strategy that keeps the old version running unless migration is backward-compatible (additive only). Otherwise: deploy migration first → deploy new code → drop old fields LATER.
   - **Stateful workloads** — blue/green is hard; data syncing during switch
   - **Cache invalidation** — new version with stale cache may misbehave
   - **Long connections (WebSocket, gRPC streams)** — drain time during blue/green switch
7. **Rollback strategy per type**:
   - Rolling: `kubectl rollout undo` / `helm rollback`
   - Blue/Green: switch traffic back to blue (fast)
   - Canary: revert traffic split to 100% old (fast)
8. **For monitoring + automated rollback** (canary):
   - Define SLO thresholds: error rate < 0.5%, p99 latency < 500ms
   - Use Prometheus query in pipeline to gate promote
   - Tools: Flagger (K8s), Spinnaker, GitLab's auto-rollout (limited)

Mark DESTRUCTIVE: traffic switch without smoke test (production exposure), removing blue infra immediately after green deploy (no rollback), DB migration that breaks old version while old code is still serving.

---

Target platform: [K8s / ASG / serverless / VM fleet]
Current strategy: [direct-replace / rolling / blue-green / canary / none]
Risk tolerance: [low / medium / high]
Schema migration?: [yes / no]
Stateful workload?: [yes / no]
Goal: [design new / debug existing / choose between]

Why this prompt works

Choosing a deployment strategy is half design, half pipeline implementation. Each strategy has a specific pipeline shape and rollback mechanism. This prompt forces a strategy-first design rather than copying YAML from elsewhere.

How to use it

  1. Match strategy to actual requirements — not all workloads need canary.
  2. Account for DB migrations separately from code deploys.
  3. For canary, require metric gating; don’t just timer-based.
  4. Test rollback in non-prod; it’s the path you’ll need under pressure.

Pipeline shapes

Rolling (Kubernetes Deployment, default)

stages: [build, deploy, verify]

build:
  stage: build
  script:
    - docker build -t "$CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA" .
    - docker push "$CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA"

deploy:
  stage: deploy
  script:
    - kubectl set image deploy/web web="$CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA"
    - kubectl rollout status deploy/web --timeout=10m
  environment:
    name: production
    deployment_tier: production
  rules:
    - if: $CI_COMMIT_TAG

verify:
  stage: verify
  needs: [deploy]
  script:
    - ./smoke-tests.sh
  rules:
    - if: $CI_COMMIT_TAG

Blue/Green (Kubernetes via Argo Rollouts)

stages: [build, deploy-green, switch-traffic, cleanup]

build:
  stage: build
  script: ./build.sh

deploy-green:
  stage: deploy-green
  script:
    - kubectl argo rollouts set image web web="$CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA"
    - kubectl argo rollouts wait web --for=updated  # wait for green pods ready
  environment: { name: production-green }

smoke-test-green:
  stage: deploy-green
  needs: [deploy-green]
  script:
    - curl -v https://green.example.com/healthz
    - ./smoke-tests.sh https://green.example.com

switch-traffic:
  stage: switch-traffic
  needs: [smoke-test-green]
  script:
    - kubectl argo rollouts promote web    # switches traffic from blue to green
  environment: { name: production }
  when: manual    # or after smoke-test passes

cleanup-blue:
  stage: cleanup
  needs: [switch-traffic]
  script:
    - sleep 1800    # 30 min rollback window
    - kubectl argo rollouts retain web --reduce
  when: manual

Canary (Kubernetes via Argo Rollouts + Prometheus)

stages: [build, canary, observe, promote-or-rollback]

deploy-canary:
  stage: canary
  script:
    - kubectl argo rollouts set image web web="$CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA"
    # Rollout spec defines steps: 10% → wait → 25% → wait → 50% → wait → 100%
  environment: { name: production }

observe:
  stage: observe
  needs: [deploy-canary]
  script:
    - sleep 600  # 10 min observation
    - ./check-slo.sh
    # check-slo.sh queries Prometheus for error_rate < 0.5% and p99_latency < 500ms
    # exits non-zero on threshold breach

promote:
  stage: promote-or-rollback
  needs: [observe]
  script:
    - kubectl argo rollouts promote web --full
  when: on_success
  environment: { name: production }

rollback:
  stage: promote-or-rollback
  needs: [observe]
  script:
    - kubectl argo rollouts abort web
  when: on_failure

Canary with AWS Lambda

deploy-lambda-canary:
  stage: deploy
  script:
    # Publish new version
    - VERSION=$(aws lambda publish-version --function-name myfunc --query Version --output text)
    # Update alias with 10/90 split
    - aws lambda update-alias --function-name myfunc --name prod \
        --function-version $VERSION \
        --routing-config "AdditionalVersionWeights={$VERSION=0.1}"
  environment: { name: production }

promote-lambda:
  stage: promote
  needs: [deploy-lambda-canary]
  script:
    # After observation, route 100% to new
    - aws lambda update-alias --function-name myfunc --name prod \
        --function-version $LATEST_VERSION \
        --routing-config "AdditionalVersionWeights={}"
  when: manual

DB migration pattern (safe for any strategy)

stages: [migrate, deploy, cleanup-migrations]

# Phase 1: Additive migration BEFORE code deploy (backward-compatible)
migrate:
  stage: migrate
  script:
    - alembic upgrade head    # adds new columns, keeps old
  rules:
    - if: $CI_COMMIT_TAG

# Phase 2: Deploy new code (reads/writes both old and new schema)
deploy:
  stage: deploy
  needs: [migrate]
  script: ./deploy.sh

# Phase 3: Cleanup (drop old columns, after all old code is gone) — SEPARATE MR
# This runs in a future pipeline, not the same one

Comparison

AspectRollingBlue/GreenCanary
Rollback speedSlow (re-deploy)Fast (flip back)Fast (flip back)
Infra cost2× during switch1.1× during canary
RiskMedium (some users hit new immediately)Low (atomic switch)Lowest (small % first)
ComplexityLowMediumHigh
Best forMost stateless workloadsDatabase-heavy, statefulHigh-risk changes
RequiresPod replacement supportTwo-target infraTraffic split (LB/mesh)

Common findings this catches

  • Canary skipped to 100% on metric blip → threshold too tight or noisy; tune SLO query.
  • Blue/green with shared DB and breaking migration → green crashes; flipping back doesn’t help.
  • Rolling deploy stuck because maxUnavailable: 0 + maxSurge: 0 → impossible math.
  • Blue/green flip leaves blue running indefinitely → cleanup not running; verify the cleanup job ran.
  • Canary observation manual-only → engineer-dependent; automate with metric gate.
  • Lambda canary on alias used by sync clients without retries → 10% see errors; rollback fast.

When to escalate

  • Strategy choice doesn’t fit infra capabilities — coordinate with platform team; may need LB / mesh changes.
  • DB migration ordering issues — engage DBA team; backward-compat may require multi-deploy plan.
  • Metric-gated rollback false-positives — SRE team for SLO tuning.

Related prompts

Newsletter

Get weekly AI workflows for DevOps engineers

Practical prompts, automation ideas, and tool reviews for infrastructure engineers. One email per week. No spam.