We will deploy an application and expose as a service on TCP port 80. The application is a custom-built image based on the php-apache image. The index.php page performs calculations to generate CPU load. More information can be found here
kubectl run php-apache --image=k8s.gcr.io/hpa-example --requests=cpu=200m --expose --port=80
This HPA scales up when CPU exceeds 50% of the allocated container resource.
kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
View the HPA using kubectl. You probably will see <unknown>/50%
for 1-2 minutes and then you should be able to see 0%/50%
kubectl get hpa
Open a new terminal in the Cloud9 Environment and run the following command to drop into a shell on a new container
kubectl run -i --tty load-generator --image=busybox /bin/sh
Execute a while loop to continue getting http:///php-apache
while true; do wget -q -O - http://php-apache; done
In the previous tab, watch the HPA with the following command
kubectl get hpa -w
You will see HPA scale the pods from 1 up to our configured maximum (10) until the CPU average is below our target (50%)
You can now stop (Ctrl + C) load test that was running in the other terminal. You will notice that HPA will slowly bring the replica count to min number based on its configuration. You should also get out of load testing application by pressing Ctrl + D