Complete guide for deploying the AI Agents Orchestrator on Microsoft Azure with enterprise-grade infrastructure.
This deployment guide covers a complete production-ready Azure infrastructure stack featuring:
Internet
|
v
ββββββββββββββββββββββββ
β Azure Front Door β β Global Load Balancer + CDN
β (Premium Tier) β
ββββββββββββββββββββββββ
|
v
ββββββββββββββββββββββββ
β Application Gateway β β Regional Load Balancer + WAF
β (WAF_v2) β
ββββββββββββββββββββββββ
|
βββββββββββββββββββββ΄ββββββββββββββββββββ
v v
βββββββββββββββββββββββ βββββββββββββββββββββββ
β Primary Region β β Secondary Region β
β (East US) β β (West US 2) β
β β β β
β ββββββββββββββββ β β ββββββββββββββββ β
β β AKS β β β β AKS β β
β β (v1.28) β β β β (Standby) β β
β β β β β β β β
β β ββββββββββ β β β β ββββββββββ β β
β β β Blue β β β β β β Blue β β β
β β β Pods β β β β β β Pods β β β
β β ββββββββββ β β β β ββββββββββ β β
β β ββββββββββ β β β β β β
β β β Green β β β β ββββββββββββββββ β
β β β Pods β β β β β
β β ββββββββββ β β β ββββββββββββββββ β
β ββββββββββββββββ β β β ACR β β
β β β β (Replica) β β
β ββββββββββββββββ β β ββββββββββββββββ β
β β ACR βββββΌββββββββββββββββββββββββββββββββββββββ
β β (Premium) β β Geo-Replication
β ββββββββββββββββ β
β β
β ββββββββββββββββ β
β β Key Vault β β
β β (Premium) β β
β ββββββββββββββββ β
β β
β ββββββββββββββββ β
β β Redis Cache β β
β β (Premium) β β
β ββββββββββββββββ β
β β
β ββββββββββββββββ β
β β Azure Files β β
β β (Premium) β β
β ββββββββββββββββ β
β β
β ββββββββββββββββ β
β β Monitor β β
β β + App Insightsβ β
β ββββββββββββββββ β
βββββββββββββββββββββββ
Azure Virtual Network (10.0.0.0/16)
β
βββ AKS Subnet (10.0.1.0/24)
β βββ System Node Pool (3-20 nodes)
β βββ User Node Pool (3-20 nodes)
β
βββ Application Gateway Subnet (10.0.2.0/24)
β βββ Application Gateway v2
β
βββ Private Endpoints Subnet (10.0.3.0/24)
βββ ACR Private Endpoint
βββ Key Vault Private Endpoint
βββ Storage Private Endpoint
Configuration:
Features:
Configuration:
Benefits:
Configuration:
Security:
Configuration:
Benefits:
Configuration:
Integration:
Configuration:
Use Cases:
Configuration:
Alerting:
Configuration:
workspace: 100 GB (AI agent workspaces)sessions: 50 GB (user sessions)# Azure CLI (v2.54.0+)
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
# Terraform (v1.5.0+)
wget https://releases.hashicorp.com/terraform/1.6.0/terraform_1.6.0_linux_amd64.zip
unzip terraform_1.6.0_linux_amd64.zip
sudo mv terraform /usr/local/bin/
# kubectl (v1.28+)
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
# Helm (v3.12+)
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
az vm list-usage):
Monthly cost breakdown (USD):
| Service | Configuration | Est. Cost |
|---|---|---|
| AKS (compute) | 6x D4s_v3 nodes | $~750 |
| AKS (management) | Free | $0 |
| ACR Premium | With geo-replication | $250 |
| Application Gateway | WAF_v2, 2 instances | $250 |
| Azure Front Door | Premium tier | $300 |
| Key Vault Premium | + operations | $50 |
| Redis Cache P1 | 6 GB Premium | $300 |
| Azure Files Premium | 150 GB | $200 |
| Log Analytics | 30-day retention | $150 |
| Bandwidth | Outbound transfer | $100 |
| Total | Β | ~$2,350/month |
Costs vary based on usage. Use Azure Pricing Calculator for accurate estimates.
# Clone repository
git clone https://github.com/hoangsonww/AI-Agents-Orchestrator.git
cd AI-Agents-Orchestrator
# Run deployment script
chmod +x deployment/azure/scripts/deploy.sh
./deployment/azure/scripts/deploy.sh
The script will:
Duration: ~30 minutes
# Login to Azure
az login
# Set subscription
az account set --subscription "YOUR_SUBSCRIPTION_ID"
# Verify
az account show
# Create service principal
az ad sp create-for-rbac \
--name "ai-orchestrator-sp" \
--role Contributor \
--scopes /subscriptions/YOUR_SUBSCRIPTION_ID \
--sdk-auth
# Save output (contains clientId, clientSecret, tenantId, subscriptionId)
cd deployment/azure/terraform
# Initialize Terraform
terraform init
# Review plan
terraform plan \
-var="environment=production" \
-var="location=eastus" \
-out=tfplan
# Apply
terraform apply tfplan
# Save outputs
terraform output -json > ../outputs.json
# Get AKS credentials
az aks get-credentials \
--resource-group ai-orchestrator-production-rg \
--name ai-orchestrator-production-aks \
--overwrite-existing
# Verify
kubectl cluster-info
kubectl get nodes
# Install NGINX Ingress
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm upgrade --install ingress-nginx ingress-nginx/ingress-nginx \
--namespace ingress-nginx \
--create-namespace \
--set controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-health-probe-request-path"=/healthz
# Install cert-manager
helm repo add jetstack https://charts.jetstack.io
helm upgrade --install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--set installCRDs=true
# Install Prometheus stack
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--set prometheus.prometheusSpec.retention=30d \
--set grafana.enabled=true
# Create namespace
kubectl create namespace ai-orchestrator
# Apply manifests
kubectl apply -f deployment/kubernetes/ -n ai-orchestrator
# Wait for deployment
kubectl rollout status deployment/ai-orchestrator-blue -n ai-orchestrator
# Login to ACR
az acr login --name aiorchestrator productionacr
# Build and push
az acr build \
--registry aiorchestrator productionacr \
--image ai-orchestrator:latest \
--image ai-orchestrator:$(git rev-parse --short HEAD) \
.
# Get Application Gateway public IP
az network public-ip show \
--resource-group ai-orchestrator-production-rg \
--name ai-orchestrator-production-appgw-pip \
--query ipAddress -o tsv
# Create A record in your DNS provider pointing to this IP
# Example: ai-orchestrator.yourdomain.com -> 20.85.123.45
# Create ClusterIssuer for Let's Encrypt
kubectl apply -f - <<EOF
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: admin@yourdomain.com
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- http01:
ingress:
class: nginx
EOF
# Certificate will be automatically issued when Ingress is created
Process:
kubectl set image deployment/ai-orchestrator-green
ai-orchestrator=aiorchestrator productionacr.azurecr.io/ai-orchestrator:v2.0.0
-n ai-orchestrator
kubectl scale deployment/ai-orchestrator-green βreplicas=3 -n ai-orchestrator
kubectl rollout status deployment/ai-orchestrator-green -n ai-orchestrator
2. **Run smoke tests**:
```bash
# Test green pods directly
GREEN_POD=$(kubectl get pod -n ai-orchestrator -l version=green -o jsonpath='{.items[0].metadata.name}')
kubectl exec -n ai-orchestrator $GREEN_POD -- curl http://localhost:5001/health
./deployment/scripts/blue-green-switch.sh blue green
kubectl patch service ai-orchestrator-service -n ai-orchestrator
-p β{βspecβ:{βselectorβ:{βversionβ:βgreenβ}}}β
4. **Monitor and verify**:
```bash
# Watch metrics in Azure Monitor
# Check logs
kubectl logs -n ai-orchestrator -l version=green --tail=100 -f
# If issues, instant rollback:
kubectl patch service ai-orchestrator-service -n ai-orchestrator \
-p '{"spec":{"selector":{"version":"blue"}}}'
# After 30 minutes of stable operation
kubectl scale deployment/ai-orchestrator-blue --replicas=0 -n ai-orchestrator
Rollback Time: < 10 seconds
Progressive Rollout Stages:
kubectl scale deployment/ai-orchestrator-stable --replicas=5 -n ai-orchestrator
kubectl scale deployment/ai-orchestrator-canary --replicas=1 -n ai-orchestrator
kubectl scale deployment/ai-orchestrator-stable --replicas=3 -n ai-orchestrator
kubectl scale deployment/ai-orchestrator-canary --replicas=1 -n ai-orchestrator
kubectl scale deployment/ai-orchestrator-stable --replicas=1 -n ai-orchestrator
kubectl scale deployment/ai-orchestrator-canary --replicas=1 -n ai-orchestrator
kubectl scale deployment/ai-orchestrator-stable --replicas=0 -n ai-orchestrator
kubectl scale deployment/ai-orchestrator-canary --replicas=3 -n ai-orchestrator
Automated with script:
./deployment/scripts/canary-rollout.sh
Rollback Time: < 30 seconds (automatic on metrics degradation)
Standard Kubernetes rolling update:
# Update image
kubectl set image deployment/ai-orchestrator \
ai-orchestrator=aiorchestrator productionacr.azurecr.io/ai-orchestrator:v2.0.0 \
-n ai-orchestrator
# Monitor
kubectl rollout status deployment/ai-orchestrator -n ai-orchestrator
# Rollback if needed
kubectl rollout undo deployment/ai-orchestrator -n ai-orchestrator
Access Azure Monitor:
# Open in browser
az monitor metrics list \
--resource $(az aks show -g ai-orchestrator-production-rg -n ai-orchestrator-production-aks --query id -o tsv) \
--metric-names "node_cpu_usage_percentage"
Key Metrics:
View in Azure Portal:
Query with KQL:
// Top 10 slowest requests
requests
| where timestamp > ago(1h)
| summarize avg(duration) by operation_Name
| top 10 by avg_duration desc
// Error rate by hour
requests
| where timestamp > ago(24h)
| summarize
total = count(),
errors = countif(success == false)
by bin(timestamp, 1h)
| project timestamp, error_rate = (errors * 100.0) / total
Access Grafana:
# Port-forward to Grafana
kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80
# Open http://localhost:3000
# Username: admin, Password: admin123
Import Dashboards:
Query Logs:
// Container logs
ContainerLog
| where Namespace == "ai-orchestrator"
| where TimeGenerated > ago(1h)
| project TimeGenerated, Computer, ContainerID, LogEntry
| order by TimeGenerated desc
// Performance metrics
Perf
| where TimeGenerated > ago(1h)
| where ObjectName == "K8SContainer"
| summarize avg(CounterValue) by CounterName, bin(TimeGenerated, 5m)
Configure Alerts:
az monitor metrics alert create \
--name "aks-high-cpu" \
--resource-group ai-orchestrator-production-rg \
--scopes $(az aks show -g ai-orchestrator-production-rg -n ai-orchestrator-production-aks --query id -o tsv) \
--condition "avg node_cpu_usage_percentage > 80" \
--window-size 5m \
--evaluation-frequency 1m \
--action-group-id /subscriptions/.../actionGroups/...
az monitor metrics alert create \
--name "pod-crash-loop" \
--resource-group ai-orchestrator-production-rg \
--scopes $(az aks show -g ai-orchestrator-production-rg -n ai-orchestrator-production-aks --query id -o tsv) \
--condition "avg kube_pod_container_status_restarts_total > 5" \
--window-size 15m \
--evaluation-frequency 5m
Network Security Groups (NSGs):
# AKS subnet - restrict inbound
az network nsg rule create \
--resource-group ai-orchestrator-production-rg \
--nsg-name aks-nsg \
--name allow-https \
--priority 100 \
--source-address-prefixes Internet \
--destination-port-ranges 443 \
--access Allow \
--protocol Tcp
Azure Firewall (optional for advanced scenarios):
# Create Azure Firewall for egress filtering
az network firewall create \
--name ai-orchestrator-firewall \
--resource-group ai-orchestrator-production-rg \
--location eastus
Azure AD Integration:
# Enable Azure AD integration for AKS
az aks update \
--resource-group ai-orchestrator-production-rg \
--name ai-orchestrator-production-aks \
--enable-azure-rbac \
--enable-aad
Managed Identity:
# AKS uses system-assigned managed identity by default
# Grant permissions to managed identity
az role assignment create \
--assignee $(az aks show -g ai-orchestrator-production-rg -n ai-orchestrator-production-aks --query identityProfile.kubeletidentity.objectId -o tsv) \
--role "Key Vault Secrets User" \
--scope $(az keyvault show -n aiorchestrator productionkv --query id -o tsv)
Key Vault Integration:
# Create secret in Key Vault
az keyvault secret set \
--vault-name aiorchestrator productionkv \
--name openai-api-key \
--value "sk-..."
# Use secret in pod via CSI driver
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
name: test-pod
spec:
containers:
- name: app
image: nginx
volumeMounts:
- name: secrets
mountPath: "/mnt/secrets"
readOnly: true
volumes:
- name: secrets
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: "azure-kv-provider"
EOF
Image Scanning with Microsoft Defender:
# Enable Defender for Container Registry
az security pricing create \
--name ContainerRegistry \
--tier Standard
# View scan results
az security assessment list \
--resource-id $(az acr show -n aiorchestrator productionacr --query id -o tsv)
Pod Security Standards:
apiVersion: policy.kubernetes.io/v1beta1
kind: PodSecurityPolicy
metadata:
name: restricted
spec:
privileged: false
allowPrivilegeEscalation: false
requiredDropCapabilities:
- ALL
runAsUser:
rule: MustRunAsNonRoot
seLinux:
rule: RunAsAny
fsGroup:
rule: RunAsAny
volumes:
- 'configMap'
- 'emptyDir'
- 'projected'
- 'secret'
- 'downwardAPI'
- 'persistentVolumeClaim'
Azure Policy for AKS:
# Assign built-in policy initiative
az policy assignment create \
--name "aks-baseline" \
--display-name "AKS Baseline Security" \
--scope $(az aks show -g ai-orchestrator-production-rg -n ai-orchestrator-production-aks --query id -o tsv) \
--policy-set-definition /providers/Microsoft.Authorization/policySetDefinitions/a8640138-9b0a-4a28-b8cb-1666c838647d
Supported Compliance Frameworks:
AKS Backup with Velero:
# Install Velero
helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts
helm upgrade --install velero vmware-tanzu/velero \
--namespace velero \
--create-namespace \
--set-file credentials.secretContents.cloud=./azure-credentials \
--set configuration.provider=azure \
--set configuration.backupStorageLocation.bucket=velero-backups \
--set configuration.backupStorageLocation.config.resourceGroup=ai-orchestrator-production-rg \
--set configuration.backupStorageLocation.config.storageAccount=aiorchestrator productionbackups \
--set snapshotsEnabled=true
# Create backup schedule
velero schedule create daily-backup --schedule="0 2 * * *"
# Manual backup
velero backup create manual-backup-$(date +%Y%m%d)
Database Backup (if using Azure Database):
# Automated backups enabled by default
# Restore from backup
az postgres flexible-server restore \
--resource-group ai-orchestrator-production-rg \
--name ai-orchestrator-db-restored \
--source-server ai-orchestrator-db \
--restore-time "2024-01-15T10:00:00Z"
Secondary Region Deployment:
# Deploy to secondary region (West US 2)
terraform apply \
-var="location=westus2" \
-var="environment=production-dr" \
-target=azurerm_kubernetes_cluster.secondary
# Configure Traffic Manager for failover
az network traffic-manager profile create \
--name ai-orchestrator-tm \
--resource-group ai-orchestrator-production-rg \
--routing-method Priority \
--unique-dns-name ai-orchestrator-global
# Add endpoints
az network traffic-manager endpoint create \
--name primary \
--profile-name ai-orchestrator-tm \
--resource-group ai-orchestrator-production-rg \
--type azureEndpoints \
--target-resource-id $(az network public-ip show -g ai-orchestrator-production-rg -n ai-orchestrator-production-appgw-pip --query id -o tsv) \
--priority 1
az network traffic-manager endpoint create \
--name secondary \
--profile-name ai-orchestrator-tm \
--resource-group ai-orchestrator-production-rg \
--type azureEndpoints \
--target-resource-id $(az network public-ip show -g ai-orchestrator-production-rg-dr -n ai-orchestrator-production-dr-appgw-pip --query id -o tsv) \
--priority 2
Manual Failover:
# 1. Verify secondary region health
kubectl --context=secondary get nodes
kubectl --context=secondary get pods -n ai-orchestrator
# 2. Update DNS to point to secondary
az network traffic-manager endpoint update \
--name primary \
--profile-name ai-orchestrator-tm \
--resource-group ai-orchestrator-production-rg \
--type azureEndpoints \
--endpoint-status Disabled
# 3. Verify traffic flow
curl https://ai-orchestrator-global.trafficmanager.net/health
Automated Failover:
View Costs:
# Cost analysis
az consumption usage list \
--start-date 2024-01-01 \
--end-date 2024-01-31 \
--query "[?contains(instanceName, 'ai-orchestrator')]"
# Set budget alerts
az consumption budget create \
--budget-name ai-orchestrator-monthly \
--amount 3000 \
--time-grain Monthly \
--start-date 2024-01-01 \
--end-date 2024-12-31 \
--resource-group ai-orchestrator-production-rg \
--notifications threshold=80 contactEmails="admin@example.com"
# Purchase 1-year or 3-year reservations for VMs
az reservations reservation-order purchase \
--reservation-order-id ... \
--sku Standard_D4s_v3 \
--location eastus \
--quantity 6 \
--term P1Y
# Add spot node pool
az aks nodepool add \
--resource-group ai-orchestrator-production-rg \
--cluster-name ai-orchestrator-production-aks \
--name spot \
--priority Spot \
--eviction-policy Delete \
--spot-max-price -1 \
--enable-cluster-autoscaler \
--min-count 0 \
--max-count 10 \
--node-vm-size Standard_D4s_v3
kubectl top nodes kubectl top pods -n ai-orchestrator
### Cost Breakdown Optimized
| Service | Configuration | Original | Optimized | Savings |
|---------|--------------|----------|-----------|---------|
| AKS (compute) | 6x D4s_v3 nodes | $750 | $450 (w/ RI) | 40% |
| ACR Premium | Geo-replication | $250 | $250 | 0% |
| App Gateway | WAF_v2 | $250 | $250 | 0% |
| Azure Front Door | Premium | $300 | $300 | 0% |
| Key Vault | Premium | $50 | $50 | 0% |
| Redis Cache | P1 | $300 | $300 | 0% |
| Azure Files | 150 GB Premium | $200 | $150 (cool tier) | 25% |
| Monitoring | 30-day retention | $150 | $100 (optimized) | 33% |
| Bandwidth | Outbound | $100 | $80 (CDN) | 20% |
| **Total** | | **$2,350** | **$1,930** | **18%** |
## Troubleshooting
### Common Issues
#### 1. Pods Not Starting
**Symptoms**:
- Pods stuck in `Pending` or `CrashLoopBackOff`
**Diagnosis**:
```bash
# Check pod status
kubectl describe pod <pod-name> -n ai-orchestrator
# Check events
kubectl get events -n ai-orchestrator --sort-by='.lastTimestamp'
# Check logs
kubectl logs <pod-name> -n ai-orchestrator --previous
Common Causes:
Diagnosis:
# Check Application Insights
az monitor app-insights query \
--app ai-orchestrator-production-ai \
--analytics-query "requests | summarize avg(duration) by bin(timestamp, 5m)"
# Check HPA status
kubectl get hpa -n ai-orchestrator
# Check resource usage
kubectl top pods -n ai-orchestrator
Solutions:
kubectl scale deployment/ai-orchestrator-blue --replicas=10Symptoms:
Diagnosis:
# Check cert-manager logs
kubectl logs -n cert-manager deployment/cert-manager
# Check certificate status
kubectl describe certificate -n ai-orchestrator
# Check certificate-request
kubectl get certificaterequest -n ai-orchestrator
Solutions:
# Delete and recreate certificate
kubectl delete certificate ai-orchestrator-tls -n ai-orchestrator
kubectl apply -f ingress.yaml
# Check Let's Encrypt rate limits
# Wait 1 hour if rate limited
Diagnosis:
# Check node status
kubectl get nodes
# Check node conditions
kubectl describe node <node-name>
# Check cluster autoscaler logs
kubectl logs -n kube-system deployment/cluster-autoscaler
Solutions:
# Manually scale node pool
az aks nodepool scale \
--resource-group ai-orchestrator-production-rg \
--cluster-name ai-orchestrator-production-aks \
--name user \
--node-count 5
# Restart node (drain + delete)
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
kubectl delete node <node-name>
# Get all resources
kubectl get all -n ai-orchestrator
# Describe deployment
kubectl describe deployment ai-orchestrator-blue -n ai-orchestrator
# Check rollout history
kubectl rollout history deployment/ai-orchestrator-blue -n ai-orchestrator
# Get pod logs
kubectl logs -f deployment/ai-orchestrator-blue -n ai-orchestrator
# Execute command in pod
kubectl exec -it <pod-name> -n ai-orchestrator -- /bin/bash
# Port-forward for debugging
kubectl port-forward svc/ai-orchestrator-service 5001:5001 -n ai-orchestrator
# Check network policies
kubectl get networkpolicies -n ai-orchestrator
# View events
kubectl get events --sort-by='.lastTimestamp' -n ai-orchestrator
Create Support Ticket:
az support tickets create \
--ticket-name "AKS-Issue-$(date +%Y%m%d)" \
--title "AKS cluster issues" \
--description "Pods not starting in AKS cluster" \
--severity moderate \
--problem-classification-id "/subscriptions/.../providers/Microsoft.Support/services/..."
Create .azure-pipelines.yml:
trigger:
branches:
include:
- main
- develop
variables:
azureSubscription: 'your-service-connection'
resourceGroup: 'ai-orchestrator-production-rg'
aksCluster: 'ai-orchestrator-production-aks'
acrName: 'aiorchestrator productionacr'
imageRepository: 'ai-orchestrator'
imageTag: '$(Build.BuildId)'
stages:
- stage: Build
jobs:
- job: BuildAndPush
pool:
vmImage: 'ubuntu-latest'
steps:
- task: Docker@2
displayName: Build and push image
inputs:
containerRegistry: '$(acrName)'
repository: '$(imageRepository)'
command: 'buildAndPush'
Dockerfile: '**/Dockerfile'
tags: |
$(imageTag)
latest
- stage: DeployStaging
condition: eq(variables['Build.SourceBranch'], 'refs/heads/develop')
jobs:
- deployment: DeployToStaging
environment: 'staging'
pool:
vmImage: 'ubuntu-latest'
strategy:
runOnce:
deploy:
steps:
- task: AzureCLI@2
inputs:
azureSubscription: '$(azureSubscription)'
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
az aks get-credentials -g $(resourceGroup) -n $(aksCluster)
kubectl set image deployment/ai-orchestrator-blue \
ai-orchestrator=$(acrName).azurecr.io/$(imageRepository):$(imageTag) \
-n ai-orchestrator
- stage: DeployProduction
condition: eq(variables['Build.SourceBranch'], 'refs/heads/main')
jobs:
- deployment: DeployToProduction
environment: 'production'
pool:
vmImage: 'ubuntu-latest'
strategy:
runOnce:
deploy:
steps:
- task: AzureCLI@2
displayName: Deploy to Green
inputs:
azureSubscription: '$(azureSubscription)'
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
az aks get-credentials -g $(resourceGroup) -n $(aksCluster)
kubectl set image deployment/ai-orchestrator-green \
ai-orchestrator=$(acrName).azurecr.io/$(imageRepository):$(imageTag) \
-n ai-orchestrator
kubectl scale deployment/ai-orchestrator-green --replicas=3
- task: ManualValidation@0
displayName: 'Approve traffic switch'
inputs:
notifyUsers: 'admin@example.com'
instructions: 'Verify green environment and approve traffic switch'
- task: AzureCLI@2
displayName: Switch Traffic
inputs:
azureSubscription: '$(azureSubscription)'
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
kubectl patch service ai-orchestrator-service \
-n ai-orchestrator \
-p '{"spec":{"selector":{"version":"green"}}}'
sleep 30
kubectl scale deployment/ai-orchestrator-blue --replicas=0
Already configured in Jenkinsfile, .gitlab-ci.yml, and .circleci/config.yml.
For Azure-specific GitHub Actions, add .github/workflows/azure-deploy.yml:
name: Azure Deploy
on:
push:
branches: [ main, develop ]
env:
AZURE_RESOURCE_GROUP: ai-orchestrator-production-rg
AKS_CLUSTER: ai-orchestrator-production-aks
ACR_NAME: aiorchestrator productionacr
jobs:
build-and-deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Azure Login
uses: azure/login@v1
with:
creds: $
- name: ACR Login
run: az acr login --name $
- name: Build and Push
run: |
az acr build \
--registry $ \
--image ai-orchestrator:$ \
--image ai-orchestrator:latest \
.
- name: Get AKS Credentials
run: |
az aks get-credentials \
--resource-group $ \
--name $
- name: Deploy to AKS
run: |
kubectl set image deployment/ai-orchestrator-blue \
ai-orchestrator=$.azurecr.io/ai-orchestrator:$ \
-n ai-orchestrator
kubectl rollout status deployment/ai-orchestrator-blue -n ai-orchestrator
For Azure-specific issues:
az support tickets createFor application issues:
kubectl logs -n ai-orchestrator -l app=ai-orchestrator