Azure Autoscaling: optimiza recursos para verano
Resumen
julio-agosto tienen tráfico reducido en muchas empresas (vacaciones, menos actividad B2B). Con Azure Autoscaling puedes reducir costes automáticamente cuando la demanda baja y escalar cuando sube. En este post verás cómo configurar autoscaling en VMs, AKS, Container Apps y App Services.
¿Por qué autoscaling en verano?
Escenario típico:
- Enero-junio: Tráfico normal, 100 requests/segundo
- Julio-agosto: Tráfico baja 40-60% (vacaciones)
- Septiembre-diciembre: Tráfico recupera + picos Black Friday
Sin autoscaling:
Con autoscaling:
Ene-Jun: €5,000 x 6 = €30,000
Jul-Ago: €2,000 x 2 = €4,000 (scale down 60%)
Sep-Dic: €5,000 x 4 = €20,000
Total: €54,000/año
Ahorro: €6,000 (10%)
Autoscaling en Virtual Machine Scale Sets (VMSS)
Configurar autoscaling basado en CPU
# Crear VMSS
az vmss create \
--resource-group rg-prod \
--name vmss-web \
--image Ubuntu2204 \
--instance-count 2 \
--vm-sku Standard_B2s \
--load-balancer lb-web \
--vnet-name vnet-prod \
--subnet subnet-web \
--upgrade-policy-mode Automatic
# Configurar autoscale profile
az monitor autoscale create \
--resource-group rg-prod \
--name autoscale-vmss-web \
--resource /subscriptions/$SUB_ID/resourceGroups/rg-prod/providers/Microsoft.Compute/virtualMachineScaleSets/vmss-web \
--min-count 1 \
--max-count 10 \
--count 2
# Regla: Scale out cuando CPU > 70%
az monitor autoscale rule create \
--resource-group rg-prod \
--autoscale-name autoscale-vmss-web \
--condition "Percentage CPU > 70 avg 5m" \
--scale out 2
# Regla: Scale in cuando CPU < 30%
az monitor autoscale rule create \
--resource-group rg-prod \
--autoscale-name autoscale-vmss-web \
--condition "Percentage CPU < 30 avg 10m" \
--scale in 1
Schedule-based autoscaling (horario verano)
# Profile para verano: 1-3 instancias (Jul-Ago)
az monitor autoscale profile create \
--resource-group rg-prod \
--autoscale-name autoscale-vmss-web \
--name summer-profile \
--min-count 1 \
--max-count 3 \
--count 1 \
--start 2025-07-01T00:00:00Z \
--end 2025-08-31T23:59:59Z \
--timezone "W. Europe Standard Time"
# Profile para resto del año: 2-10 instancias
az monitor autoscale profile create \
--resource-group rg-prod \
--autoscale-name autoscale-vmss-web \
--name normal-profile \
--min-count 2 \
--max-count 10 \
--count 2 \
--start 2025-09-01T00:00:00Z \
--end 2026-06-30T23:59:59Z \
--timezone "W. Europe Standard Time"
Visualización:
gantt
title Autoscaling Profile por Temporada
dateFormat YYYY-MM-DD
section Capacidad
Normal (2-10 VMs) :active, 2025-01-01, 2025-06-30
Verano (1-3 VMs) :crit, 2025-07-01, 2025-08-31
Normal (2-10 VMs) :active, 2025-09-01, 2025-12-31
Autoscaling en Azure Kubernetes Service (AKS)
Cluster Autoscaler (node-level)
# Habilitar cluster autoscaler
az aks nodepool update \
--resource-group rg-aks \
--cluster-name aks-prod \
--name nodepool1 \
--enable-cluster-autoscaler \
--min-count 1 \
--max-count 5
# Configurar scale-down durante verano
az aks update \
--resource-group rg-aks \
--name aks-prod \
--cluster-autoscaler-profile \
scale-down-delay-after-add=10m \
scale-down-unneeded-time=5m \
scale-down-utilization-threshold=0.3 # Scale down cuando CPU < 30%
Horizontal Pod Autoscaler (pod-level)
# hpa-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 2
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- name: web
image: myapp:v1
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 1 # Verano: bajar a 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Esperar 5min antes de scale down
policies:
- type: Percent
value: 50 # Reducir 50% cada vez
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100 # Duplicar pods cada vez
periodSeconds: 15
Apply:
KEDA: Event-driven autoscaling
# keda-scaledobject.yaml - Scale basado en Azure Service Bus queue
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: order-processor-scaler
spec:
scaleTargetRef:
name: order-processor
minReplicaCount: 0 # Scale to zero en verano si no hay mensajes
maxReplicaCount: 20
triggers:
- type: azure-servicebus
metadata:
queueName: orders
namespace: sb-prod
messageCount: "10" # 1 pod por cada 10 mensajes
connectionFromEnv: SERVICEBUS_CONNECTION
Install KEDA:
helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --namespace keda --create-namespace
kubectl apply -f keda-scaledobject.yaml
Autoscaling en Container Apps
# Crear Container App con autoscaling
az containerapp create \
--name app-api \
--resource-group rg-container-apps \
--environment containerapp-env \
--image myregistry.azurecr.io/api:v1 \
--target-port 80 \
--ingress external \
--min-replicas 0 \
--max-replicas 10 \
--scale-rule-name http-scaling \
--scale-rule-type http \
--scale-rule-http-concurrency 50 # 1 replica por cada 50 requests concurrentes
# Regla custom: scale basado en CPU
az containerapp update \
--name app-api \
--resource-group rg-container-apps \
--scale-rule-name cpu-scaling \
--scale-rule-type cpu \
--scale-rule-metadata "type=Utilization" "value=70"
Ventaja Container Apps: Scale to zero automático (sin coste cuando no hay tráfico)
Autoscaling en App Service
Configuración básica
# Crear App Service Plan con autoscaling
az appservice plan create \
--name plan-webapp-s1 \
--resource-group rg-webapp \
--sku S1 \
--is-linux
# Configurar autoscale
az monitor autoscale create \
--resource-group rg-webapp \
--name autoscale-webapp \
--resource /subscriptions/$SUB_ID/resourceGroups/rg-webapp/providers/Microsoft.Web/serverfarms/plan-webapp-s1 \
--min-count 1 \
--max-count 5 \
--count 1
# Scale out basado en HTTP queue length
az monitor autoscale rule create \
--resource-group rg-webapp \
--autoscale-name autoscale-webapp \
--condition "HttpQueueLength > 100 avg 5m" \
--scale out 2
# Scale in basado en CPU
az monitor autoscale rule create \
--resource-group rg-webapp \
--autoscale-name autoscale-webapp \
--condition "CpuPercentage < 40 avg 10m" \
--scale in 1
Schedule: reducir capacidad noches/fines de semana
# Profile nocturno: 1 instancia (22:00-06:00)
az monitor autoscale profile create \
--resource-group rg-webapp \
--autoscale-name autoscale-webapp \
--name night-profile \
--min-count 1 \
--max-count 1 \
--count 1 \
--recurrence week mon tue wed thu fri \
--start "22:00" \
--end "06:00" \
--timezone "W. Europe Standard Time"
# Profile fin de semana: 1 instancia (Sab-Dom)
az monitor autoscale profile create \
--resource-group rg-webapp \
--autoscale-name autoscale-webapp \
--name weekend-profile \
--min-count 1 \
--max-count 2 \
--count 1 \
--recurrence week sat sun \
--timezone "W. Europe Standard Time"
Autoscaling con Azure Load Testing
Test pre-verano: validar autoscaling funciona
# loadtest-config.yaml
testId: summer-autoscale-test
displayName: "Validate autoscaling before summer"
description: "Test autoscaling from 1 to 5 instances"
loadTestConfiguration:
engineInstances: 1
splitAllCSVs: false
testPlan: loadtest.jmx
env:
- name: TARGET_URL
value: https://myapp.azurewebsites.net
- name: THREADS
value: "100"
- name: RAMP_UP
value: "60" # Aumentar carga en 60 segundos
- name: DURATION
value: "600" # 10 minutos
# Crear load test
az load create \
--name loadtest-autoscale \
--resource-group rg-testing \
--location westeurope
# Ejecutar test
az load test create \
--load-test-resource loadtest-autoscale \
--resource-group rg-testing \
--test-id summer-test \
--load-test-config-file loadtest-config.yaml
# Monitorizar durante test
az monitor metrics list \
--resource /subscriptions/$SUB_ID/resourceGroups/rg-prod/providers/Microsoft.Compute/virtualMachineScaleSets/vmss-web \
--metric "Percentage CPU" \
--start-time 2025-07-01T10:00:00Z \
--end-time 2025-07-01T10:15:00Z \
--interval PT1M
Esperado:
- Inicio: 1 instancia, CPU 80%
- Minuto 3: Trigger scale out → 3 instancias
- Minuto 5: CPU baja a 40%, estabiliza en 3 instancias
- Fin test: Scale in a 1 instancia después de 10 min
Monitoring y alertas
Dashboard autoscaling
// Query Azure Monitor
InsightsMetrics
| where TimeGenerated > ago(7d)
| where Name == "VMScaleSetInstanceCount"
| summarize InstanceCount = avg(Val) by bin(TimeGenerated, 1h), ResourceId
| render timechart
Alertas de autoscaling events
# Alerta cuando autoscaling falla
az monitor activity-log alert create \
--resource-group rg-monitoring \
--name alert-autoscale-failed \
--description "Autoscale operation failed" \
--condition category=Autoscale and operationName=Microsoft.Insights/AutoscaleSettings/ScaleAction and status=Failed \
--action-group ag-ops-team
# Alerta cuando alcanza max capacity
az monitor metrics alert create \
--name alert-max-capacity \
--resource-group rg-prod \
--scopes /subscriptions/$SUB_ID/resourceGroups/rg-prod/providers/Microsoft.Compute/virtualMachineScaleSets/vmss-web \
--condition "count instances >= 10" \
--description "VMSS reached maximum capacity" \
--window-size 5m \
--evaluation-frequency 1m \
--action ag-ops-team
Buenas prácticas
1. Cooldown periods
Problema: Scale flapping (subir-bajar-subir constantemente)
Solución:
# Cooldown de 10 minutos después de scale out
az monitor autoscale rule update \
--resource-group rg-prod \
--autoscale-name autoscale-vmss-web \
--rule-name scale-out-cpu \
--cooldown 10
2. Aggressive scale up, conservative scale down
# HPA con behavior asimétrico
behavior:
scaleUp:
stabilizationWindowSeconds: 0 # Inmediato
policies:
- type: Percent
value: 100 # Duplicar
periodSeconds: 15
scaleDown:
stabilizationWindowSeconds: 300 # Esperar 5 min
policies:
- type: Pods
value: 1 # Solo 1 pod cada vez
periodSeconds: 60
3. Predictive autoscaling (ML-based)
# Habilitar predictive autoscaling (preview)
az monitor autoscale update \
--resource-group rg-prod \
--name autoscale-vmss-web \
--predictive-autoscale-mode ForecastOnly # o Enabled para autoscale automático
Ventaja: Azure ML predice tráfico basado en histórico → pre-scale antes de que llegue la carga
4. Multi-metric autoscaling
# Combinar CPU + Memory + Custom metric
az monitor autoscale rule create \
--resource-group rg-prod \
--autoscale-name autoscale-vmss-web \
--condition "Percentage CPU > 70 avg 5m" \
--scale out 2
az monitor autoscale rule create \
--resource-group rg-prod \
--autoscale-name autoscale-vmss-web \
--condition "Available Memory Bytes < 536870912 avg 5m" \ # 512 MB
--scale out 2
az monitor autoscale rule create \
--resource-group rg-prod \
--autoscale-name autoscale-vmss-web \
--condition "Requests Per Second > 1000 avg 3m" \
--scale out 3
Cost optimization checklist verano
- Identificar workloads con tráfico reducido en verano
- Configurar schedule-based autoscaling Jul-Ago (min replicas reducidas)
- Test autoscaling con Azure Load Testing
- Configurar alertas de max capacity reached
- Documentar baseline de performance (P95 latency, error rate)
- Monitoring diario primeros 7 días después de activar autoscaling
- Revisar costes semanalmente (esperar 30-50% reducción)
- Volver a configuración normal Sept 1
- Post-mortem: calcular ahorro real vs proyectado
Alternativas a autoscaling
| Estrategia | Ahorro | Complejidad | Mejor para |
|---|---|---|---|
| Autoscaling | 30-50% | Media | Tráfico variable |
| Shutdown nocturno | 60% | Baja | Dev/Test environments |
| Reserved Instances | 40% | Baja | Workloads 24/7 estables |
| Spot VMs | 70-90% | Alta | Batch processing tolerante a interrupciones |
| Serverless (Functions) | 80% | Media | Event-driven workloads |
Troubleshooting
Autoscaling no funciona
Debug checklist:
# 1. Verificar metrics se están recolectando
az monitor metrics list \
--resource $RESOURCE_ID \
--metric "Percentage CPU" \
--start-time "2025-07-01T10:00:00Z" \
--end-time "2025-07-01T10:30:00Z"
# 2. Ver autoscale history
az monitor autoscale history list \
--resource-group rg-prod \
--autoscale-name autoscale-vmss-web \
--start-time "2025-07-01T00:00:00Z"
# 3. Verificar service limits
az vm list-usage --location westeurope --query "[?name.value=='cores'].{Name:name.localizedValue, Current:currentValue, Limit:limit}"
Errores comunes:
- Quota insuficiente (solicitar aumento)
- RBAC permissions faltantes (agregar Monitoring Contributor)
- Metric delay (esperar 5-10 minutos para que métricas aparezcan)