First Inference Service Link to heading

In this link, you can check the extremely simple example Kserve provide. Basically it’s downloading a model from some public google storage to give you some quick and simple experience.

Journey started from local minio to google cloud Link to heading

I planned to use local minio in first round, however I cannot bypass SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1006)')).

To recap what happened: in notebook, I can successfully upload my tensorflow model into minio pod; when kserve is trying to read and download it, SSLError happened. I posted a question, however didn’t receive many help from the community.

Google storage as minio’s backend Link to heading

I moved to use google storage to be the backend of my minio. It was working, and details can be found here. The gist is: 1. prepare for secret.yaml containing gcp’s json and 2. having serviceaccount.yaml mount that secret.

I also referred to this post

Once applied both manifests, kserve will “know” how to connect with google storage.

Before simple test: chaos of support in pytorch Link to heading

Once I figured out how kserve can use blob storage, I immediately start to prepare for a simple test of kserve endpoint.

I tried to use pytorch as the model to test, however I found it really hard in several ways.

First challenge is comming from all kinds of concepts pytorch have. I can recall some: JIT, TorchScript, Triton, etc.

What I can understand now is that in ScriptMode, JIT is the compiler and TorchScript is the script from this post, maybe it’s right or not so accurate. Because generically kserve have great support on TorchScript according to their documentation, I thought the path is pretty promising until I see torchscript is deprecated.

Even I still try to learn more on TorchScript on serveral other fronts, like tracing vs sripting, I don’t feel good on the path of using pytorch anymore.

Second challenge I remember came from triton fronts. At the first I felt like triton is an opensource framework that will be compatible in a lot of hardware, but somehow later on I sensed that it’s closely tied to nvidia hardware. Since I am running on top of an old macbook pro, this path is also a dead end.

Leave some posts I found during tracing triton’s path:

  1. kserve triton
  2. comprehensive post

Last one that really made me switch my mind to tensorflow was that I saw a post saying they experienced the same on the complexity of hosting pytorch model in kserve. From then on, I made my mind to host using a tensorflow model.

Translation and simple test Link to heading

I “translated” the model from pytorch into tensorflow.

Before I tested in kserve, I used tensorflow serving to test. Details can be found in readme.

After a successful tensorflow serving test, I was also able to put it into kserve. Details can be found here.

After simple test: difficulty in testing custom transformer and etc Link to heading

I also wanted to test out transformers, however I met great difficulty in that front. I have to say not fully understanding the infrastructure and lacking deep knowledge of istio also contribute this, I finally gave up on doing a “successful” testing in my local.

What I can achieve is: I can create transformer with predictor together.

Chatting with chatgpt, I learnt that

localhost:8080 → knative-local-gateway(cluster-local-gateway) → KServe → Transformer → Predictor

And usual testing url would be

# port forward cluster-local-gateway to 8082
curl -v \
  -H "Host: mnist-tf.kubeflow-user-example-com.example.com" \
  -H "Content-Type: application/json" \
  --data @input.json \
  http://localhost:8082/v1/models/mnist-tf:predict

However when I did that curl test call, it kept giving me 404.

I tried to chat with chatgpt several rounds to try to fix the issue; what I have tried included but not limited to:

# patch virtual service to cluster-local-gateway
kubectl patch virtualservice mnist-tf \\n  -n kubeflow-user-example-com \\n  --type=json \\n  -p='[{\n    "op": "replace",\n    "path": "/spec/http/0/route/0/destination/host",\n    "value": "cluster-local-gateway.istio-system.svc.cluster.local"\n  }]'
# ...

# patch virtualservice port to 8012 because by inspecting endpoints or pods, chatgpt found port is 8012
kubectl patch virtualservice mnist-tf-transformer-ingress \\n  -n kubeflow-user-example-com \\n  --type='json' \\n  -p='[{"op": "replace", "path": "/spec/http/0/route/0/destination/port/number", "value": 8012}]'\n
# ...

# patch more on mnist-tf-transformer-ingress
kubectl patch virtualservice mnist-tf-transformer-ingress \\n  -n kubeflow-user-example-com \\n  --type='json' \\n  -p='[\n    {"op": "replace", "path": "/spec/http/0/route/0/destination/port/number", "value": 8012},\n    {"op": "replace", "path": "/spec/http/1/route/0/destination/port/number", "value": 8012},\n    {"op": "replace", "path": "/spec/http/2/route/0/destination/port/number", "value": 8012},\n    {"op": "replace", "path": "/spec/http/3/route/0/destination/port/number", "value": 8012}\n]'\n
# ...

None of the patch was working. I will leave it to future me to solve it when I have time. Move on for now.