In this chapter, we will install Kubeflow on Amazon EKS cluster. If you don’t have an EKS cluster, please follow instructions from getting started guide and then launch your EKS cluster using eksctl chapter
We need more resources for completing the Kubeflow chapter of the EKS Workshop. First, we’ll increase the size of our cluster to 6 nodes:
export NODEGROUP_NAME=$(eksctl get nodegroups --cluster eksworkshop-eksctl -o json | jq -r '.[0].Name')
eksctl scale nodegroup --cluster eksworkshop-eksctl --name $NODEGROUP_NAME --nodes 6
Scaling the nodegroup will take 2 - 3 minutes.
Download 0.7 release of kfctl
. This binary will allow you to install Kubeflow on Amazon EKS:
curl --silent --location "https://github.com/kubeflow/kubeflow/releases/download/v0.7.0/kfctl_v0.7.0_linux.tar.gz" | tar xz -C /tmp
sudo mv -v /tmp/kfctl /usr/local/bin
Export Kubeflow configuration file:
export CONFIG_URI=https://raw.githubusercontent.com/kubeflow/manifests/v0.7-branch/kfdef/kfctl_aws.0.7.0.yaml
Set an environment variable for your AWS cluster name, and Kubeflow deployment to be the same as cluster name. Set the path to the base directory where you want to store Kubeflow deployments. Then set the Kubeflow application directory for this deployment.
export AWS_CLUSTER_NAME=eksworkshop-eksctl
export KF_NAME=${AWS_CLUSTER_NAME}
export BASE_DIR=~/environment
export KF_DIR=${BASE_DIR}/${KF_NAME}
Until https://github.com/kubeflow/kubeflow/issues/3827 is fixed, install aws-iam-authenticator
:
curl -o aws-iam-authenticator https://amazon-eks.s3-us-west-2.amazonaws.com/1.13.7/2019-06-11/bin/linux/amd64/aws-iam-authenticator
chmod +x aws-iam-authenticator
sudo mv aws-iam-authenticator /usr/local/bin
Run kfctl build command to set up your configuraiton
mkdir -p ${KF_DIR}
cd ${KF_DIR}
kfctl build -V -f ${CONFIG_URI}
Set an environment variable pointing to your local configuration file
export CONFIG_FILE=${KF_DIR}/kfctl_aws.0.7.0.yaml
Replace EKS Cluster Name and AWS Region in your $(CONFIG_FILE).
sed -i -e 's/kubeflow-aws/'"$AWS_CLUSTER_NAME"'/' ${CONFIG_FILE}
sed -i "s@us-west-2@$AWS_REGION@" ${CONFIG_FILE}
Replace Worker node IAM Roles in your $(CONFIG_FILE). Before we do that, let’s check if we have ROLE_NAME in our environment variable
test -n "$ROLE_NAME" && echo ROLE_NAME is "$ROLE_NAME" || echo ROLE_NAME is not set
If you get ROLE_NAME is not set, run the commands from export the Worker node role and run the command again
Once you get proper response, run next command to replace with $ROLE_NAME
sed -i "s@eksctl-eksworkshop-eksctl-nodegroup-ng-a2-NodeInstanceRole-xxxxxxx@$ROLE_NAME@" ${CONFIG_FILE}
Apply configuration and deploy Kubeflow on your cluster:
rm -rf kustomize
kfctl apply -V -f ${CONFIG_FILE}
Run below command to check the status
kubectl get pods -n kubeflow
Installing Kubeflow and its toolset may take 2 - 3 minutes. Few pods may initially give Error or CrashLoopBackOff status. Give it some time, they will auto-heal and will come to Running state
You should see similar results