Why kubeflow Link to heading

Look at this image first:

kubeflow architecture

Story goes back to 2019, by then I had honor working in a project to build a machine learning platform. The team lead had basically exact same idea on the architecture with this diagram, just it was based on airflow. I thought it’s time to build this locally to remind me the happy time.

Environment Link to heading

  • Macbook pro 2018, 6 cores, 32G RAM, Sequoia 15.7.1
  • docker 28.5.1
  • kubectl v1.34.1
  • kustomize v5.7.1
  • minikube v1.37.0

On mac if you are missing those, you can use brew to install them. It’s pretty simple, will leave to you to try.

Installation Link to heading

  1. Create minikube cluster

I gave my docker ram max to 24GB, cores max to 6 cores in docker desktop. I still want to reserve 8GB memory to other tasks the same time running kubeflow locally.

After doing so, I run

minikube start --driver=docker --memory=max --cpus=max
  1. Use kustomize to build

Clone repo first and then run a command in your shell

git clone https://github.com/kubeflow/manifests.git
git checkout v1.9-branch
cd manifests
while ! kustomize build example | kubectl apply --server-side --force-conflicts -f -; do echo "Retrying to apply resources"; sleep 20; done
  1. Portforward before you see UI

After all pods are healthy, run

kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80

Then go to browser (firefox or safari) to login.

Default username: user@example.com Default password: 12341234

Observations Link to heading

  1. On my mac, it took quickest 15 mins and maximum 30 mins to have kubeflow installs. During the waiting, it constantly show errors like

    error: resource mapping not found for name: "<RESOURCE_NAME>" namespace: "<SOME_NAMESPACE>" from "STDIN": no matches for kind "<CRD_NAME>" in version "<CRD_FULL_NAME>"
    ensure CRDs are installed first
    

    This is because a kustomization applies both a CRD and a CR very quickly, and the CRD has not yet become Established yet.

  2. I also tried to install kubeflow on an ubuntu environment, where I faced a timeout issue on katib db. From the description, it shows

    ...
    can't connect to local mysql server through socket '/var/lib/mysql/mysql.sock 2
    ...
    

    This issue prevent katib db and katib manager being healthy. If you know what happened and had a solution, please comment. Much appreciated.

  3. Pods are all unhealthy after minikube pause.

    Unless a real k8s cluster, minikube has pause and unpause functions. But after pause then unpause, all pods are unhealthy and cannot recover to a functional kubeflow environment with about an hour.

  4. Kubeflow environment can come back after being idle

    If you just simply leave kubeflow environment idle for 1 or 2 days or even longer, when you come back, you might see some pods unhealthy. If needed then kill them to help them to recreated. istio is the one creating troubles for you often. But according to my quick experience, you wait for a couple of minutes (15+), it had a chance to be self-recovered.

  5. Why specific branch

    If you noticed, I used v1.9-branch. The reason for that is at the moment, lastest branch includes spark operators and other good stuff I cannot spawn up locally. It keeps CrashLoopBackOff. So I decided to try older branch to have a more conservative experience.

  6. Firefox and Safari works best with UI after portforwarding. Chrome will give some weird error when oauth callback. If you know what happened and even have a solution, please comment. Much appreciated.

Reference Link to heading

At the end, want to show some awesome screenshot about a local kubeflow environment.

kubeflow k8s

kubeflow ui