Table of Contents
Why kubeflow Link to heading
Look at this image first:
Story goes back to 2019, by then I had honor working in a project to build a machine learning platform. The team lead had basically exact same idea on the architecture with this diagram, just it was based on airflow. I thought it’s time to build this locally to remind me the happy time.
Environment Link to heading
- Macbook pro 2018, 6 cores, 32G RAM, Sequoia 15.7.1
- docker 28.5.1
- kubectl v1.34.1
- kustomize v5.7.1
- minikube v1.37.0
On mac if you are missing those, you can use brew to install them. It’s pretty simple, will leave to you to try.
Installation Link to heading
- Create minikube cluster
I gave my docker ram max to 24GB, cores max to 6 cores in docker desktop. I still want to reserve 8GB memory to other tasks the same time running kubeflow locally.
After doing so, I run
minikube start --driver=docker --memory=max --cpus=max
- Use kustomize to build
Clone repo first and then run a command in your shell
git clone https://github.com/kubeflow/manifests.git
git checkout v1.9-branch
cd manifests
while ! kustomize build example | kubectl apply --server-side --force-conflicts -f -; do echo "Retrying to apply resources"; sleep 20; done
- Portforward before you see UI
After all pods are healthy, run
kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80
Then go to browser (firefox or safari) to login.
Default username: user@example.com
Default password: 12341234
Observations Link to heading
On my mac, it took quickest 15 mins and maximum 30 mins to have kubeflow installs. During the waiting, it constantly show errors like
error: resource mapping not found for name: "<RESOURCE_NAME>" namespace: "<SOME_NAMESPACE>" from "STDIN": no matches for kind "<CRD_NAME>" in version "<CRD_FULL_NAME>" ensure CRDs are installed firstThis is because a kustomization applies both a CRD and a CR very quickly, and the CRD has not yet become
Establishedyet.I also tried to install kubeflow on an ubuntu environment, where I faced a timeout issue on
katibdb. From the description, it shows... can't connect to local mysql server through socket '/var/lib/mysql/mysql.sock 2 ...This issue prevent katib db and katib manager being healthy. If you know what happened and had a solution, please comment. Much appreciated.
Pods are all unhealthy after minikube pause.
Unless a real k8s cluster, minikube has
pauseandunpausefunctions. But after pause then unpause, all pods are unhealthy and cannot recover to a functional kubeflow environment with about an hour.Kubeflow environment can come back after being idle
If you just simply leave kubeflow environment idle for 1 or 2 days or even longer, when you come back, you might see some pods unhealthy. If needed then kill them to help them to recreated.
istiois the one creating troubles for you often. But according to my quick experience, you wait for a couple of minutes (15+), it had a chance to be self-recovered.Why specific branch
If you noticed, I used
v1.9-branch. The reason for that is at the moment, lastest branch includes spark operators and other good stuff I cannot spawn up locally. It keepsCrashLoopBackOff. So I decided to try older branch to have a more conservative experience.Firefox and Safari works best with UI after portforwarding. Chrome will give some weird error when oauth callback. If you know what happened and even have a solution, please comment. Much appreciated.
Reference Link to heading
- https://medium.com/@prayag-sangode/installing-kubeflow-60dfb9d620fb
- https://github.com/kubeflow/manifests
- https://www.kubeflow.org/docs/started/installing-kubeflow/
- https://dagshub.com/blog/how-to-install-kubeflow-locally/
At the end, want to show some awesome screenshot about a local kubeflow environment.

