Open WebUI + Ollama with Azure Kubernetes Service & Ingress TLS

Open WebUI is a fork of LibreChat, an open source AI chat platform that we have extensively discussed on our blog and integrated on behalf of clients. Where LibreChat integrates with any well-known remote or local AI service on the market, Open WebUI is focused on integration with Ollama — one of the easiest ways to run & serve AI models locally on your own server or cluster. In fact, Open WebUI was formerly known as Ollama WebUI but changed its name, because it is developed by a separate team from Ollama.

There is growing interest in deploying local AI solutions such as Open WebUI + Ollama using Kubernetes, to facilitate scalability beyond a single server (a use case that is easily fulfiled by Docker Compose) and to consolidate the infrastructure with other apps which might use the local inference cluster.

This article describes a basic, production ready Open WebUI + Ollama deployment using AKS, Azure’s managed Kubernetes service. There are some Kubernetes resource types and objects pertaining to cert-manager and ingress-nginx that we have to add to the bare cluster, in order to access the Open WebUI service securely.

You need the Azure CLI, kubectl, and Helm installed on your local machine to complete this deployment. This tutorial assumes that you already have an Azure subscription, an AKS cluster with a minimum of one node in the nodePool deployed, and a DNS host with API for programmatically updating the records in your domain’s zone. You should use the az login, az configure, and az aks install-cli commands to set the default resource group and cluster name, as well as authenticating your kubectl command to the AKS cluster by creating a kubeconfig at ~/.kube/config.

We will use cert-manager to issue a wildcard certificate that is valid for all subdomains of the 2nd level domain (SLD), and renewed automatically by the Kubernetes ClusterIssuer and Certificate objects through a DNS challenge at _acme-challenge.example.com before expiration every 90 days. For the example, the domain is delegated to Cloudflare DNS and the ACME issuer we will use is Let’s Encrypt.
The NGINX Ingress Controller, represented by the Kubernetes IngressClass and Ingress objects, reverse proxies the HTTPS web requests made to the Open WebUI Service through the default Azure load balancer named “kubernetes” provisioned initially with the AKS cluster. As the Azure load balancer supports routing TCP transport layer traffic on L4 of the OSI model only, the SSL must be terminated at the NGINX ingress as opposed to the LB.
There is also the option of using an Azure Application Gateway Ingress Controller (AGIC), which routes application layer traffic on L7 and terminates SSL using 1) a certificate bundle uploaded in PFX format, or 2) obtained from an ACME issuer through a managed app using workload identity to modify Azure DNS with the DNS Zone Contributor role.
- The cost of using an Azure Application Gateway (AGW) is considerably more than reusing the existing Azure Load Balancer with NGINX ingress, as there is a $0.20/hr ($146/mo) charge for the AGW in addition to capacity units and outbound data transfer. Microsoft has introduced a basic SKU for the Application Gateway at $0.0225/hr but it is still in preview (i.e. not available in all regions), and it is not perfectly clear if the basic SKU is compatible with AGIC for an AKS clsuter or not. There is also the additional complexity of setting up an additional Entra Workload ID or service principal, which is a requirement for AGIC to access Azure Resource Manager (ARM) to manage the AGW.

Create Kubernetes Deployments and Services (ClusterIP and NodePort)

First, it is necessary to deploy Ollama for model serving, followed by the Open WebUI frontend to the Kubernetes cluster using the Deployment and Service objects. For production, it is of course, best practice to expose a Pod only through an Ingress using a ClusterIP Service — so that it is accessible publicly only with HTTPS. To illustrate the functioning of the services for testing though, we will also expose the Pod on the Kubernetes node’s external IP through a NodePort Service.

This example also assumes that you have enabled the option to assign a public IP address to your Kubernetes node(s) in the nodePool at the time of deploying the AKS cluster.

It is needed to define two Persistent Volume Claims (PVC) which is the Kubernetes resource that is equivalent to Docker volumes for “persisting” data beyond the lifecycle of a container, in this case, Pod — one PVC called ollama-data for the /root/.ollama path storing downloaded model weights inside the ollama Pod, and one PVC called open-webui-data for the /app/backend/data path storing app data for Open WebUI inside that Pod.

We use the managed-csi storageClass that is pre-configured on AKS clusters to create a 32Gi Azure managed disk for each Deployment. It is a Kubernetes CSI driver provided by Microsoft that works similarly to REX-Ray, a storage plugin by Dell Technologies for Docker Swarm that is no longer actively under development. Note that Azure will deploy the closest disk size that meets (or exceeds) the amount of storage that you request in your Deployment — the managed-csi storageClass uses a Standard SSD with LRS (locally redundant storage). In this particular case as our deployment definition calls for 32Gi, it will use an E4 disk size.

If you wish to use other SSD types, such as the Premium SSD or Premium SSD v2 SKUs for higher performance or more durability with ZRS (zone redundant storage), you can use the pre-configured storageClasses like managed-csi-premium listed by the kubectl describe storageClasses command or create your own custom storageClass.

As the accessMode is ReadWriteOnce (RWO), an Azure Disk created in this way can only be mounted to one Pod at a time — if you require a shared storage that can be simultaneously mounted and written to by multiple Pods (running across different nodes as a Deployment or StatefulSet), then the Azure Files CSI driver that provides a managed file share which supports ReadWriteMany (RWM) may be a more suitable solution.

Create the following Kubernetes YAML files on your local machine and apply them to your AKS cluster using the kubectl apply -f <file name> command.

ollama.yaml

# ollama.yaml 

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ollama
  template:
    metadata:
      labels:
        app: ollama
    spec:
      containers:
      - name: ollama
        image: ollama/ollama:latest
        ports:
        - containerPort: 11434
        volumeMounts:  # <--- Add volume mounts
        - name: ollama-data
          mountPath: /root/.ollama
      volumes:  # <--- Add volumes
      - name: ollama-data
        persistentVolumeClaim:
          claimName: ollama-pvc
      priorityClassName: system-node-critical  
  strategy:
    type: Recreate

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ollama-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 32Gi  # adjust the storage size as needed
  storageClassName: managed-csi  # reference the StorageClass

---
apiVersion: v1
kind: Service
metadata:
  name: ollama-service
spec:
  selector:
    app: ollama
  ports:
  - name: http
    port: 11434
    targetPort: 11434
    protocol: TCP
  type: ClusterIP

---
apiVersion: v1
kind: Service
metadata:
  name: ollama-service-node-port
spec:
  selector:
    app: ollama
  ports:
  - name: http
    port: 11434
    targetPort: 11434
    nodePort: 31434
    protocol: TCP
  type: NodePort

openwebui.yaml

# openwebui.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: open-webui
spec:
  replicas: 1
  selector:
    matchLabels:
      app: open-webui
  template:
    metadata:
      labels:
        app: open-webui
    spec:
      containers:
      - name: open-webui
        image: ghcr.io/open-webui/open-webui:main
        ports:
        - containerPort: 8080
        env:
        - name: OLLAMA_BASE_URL
          value: http://ollama-service.default.svc.cluster.local:11434
        volumeMounts:
        - name: open-webui-data
          mountPath: /app/backend/data
      volumes:
      - name: open-webui-data
        persistentVolumeClaim:
          claimName: open-webui-pvc
      priorityClassName: system-node-critical
  strategy:
    type: Recreate

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: open-webui-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 32Gi  # adjust the storage size as needed
  storageClassName: managed-csi  # reference the StorageClass

---
apiVersion: v1
kind: Service
metadata:
  name: open-webui
spec:
  selector:
    app: open-webui
  ports:
  - name: http
    port: 3000
    targetPort: 8080
    nodePort: 30080
  type: NodePort

After creating the Deployments and Services, you should be able to pull your desired models either through the ollama run command inside the Pod, or using the Settings > Models part of Open WebUI once you have set the connection to Ollama by specifying http://ollama-service.default.svc.cluster.local:11434 as the Ollama Base URL in Connections after creating the initial user account.

You should now be able to access the Open WebUI dashboard (insecurely) in a web browser at your Kubernetes node’s public IP followed by the node port number, 30080 provided that inbound traffic to the node port range (30000-32767) is allowed for your AKS cluster’s network security group (NSG) — for example http://52.179.xxx.xx:30080. To get the public IP of your Kubernetes node, you can run:

$ kubectl get nodes -o wide
NAME                                STATUS   ROLES   AGE   VERSION   INTERNAL-IP   EXTERNAL-IP     OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
aks-agentpool-60568521-vmss000000   Ready    agent   10d   v1.28.9   10.224.0.4    52.179.xxx.xx   Ubuntu 22.04.4 LTS   5.15.0-1060-azure   containerd://1.7.15-1

Open WebUI (formerly Ollama WebUI) on Azure Kubernetes Service

To pull your desired model by executing a command inside the Ollama Pod, use the following kubectl commands to get the name of the running Pod and exec into it. You can find a list of available models at the Ollama library. If the Kubernetes node running your Ollama Pod is a VM size with CPUs only, then tinyllama (1.1B), phi (2.7B), or llama3 (8B) could be good initial models to try. If your Kubernetes node is running on a VM size with a GPU, then you can practically try any model, including the larger models such as llama3 with 70B parameters or more.

$ kubectl get pods
NAME                          READY   STATUS    RESTARTS   AGE
ollama-69dd5867c4-6tc9g       1/1     Running   0          4d9h
open-webui-85cfd5b9f6-p868t   1/1     Running   0          4d9h

$ kubectl exec <pod name> -c ollama ollama run <model name>

Install cert-manager on AKS using Helm & issue the Let’s Encrypt wildcard certifcate

The next step will be to install cert-manager using Helm, and create the ClusterIssuer and a Certificate resource and object to issue a wildcard cert from Let’s Encrypt for accessing Open WebUI. Without enabling HTTPS, your login credentials for Open WebUI and chat contents could be intercepted. It is therefore recommended to create the initial Open WebUI account only after the Ingress has been set up, or to change the OWUI password immediately after having done so.

Because the dns-01 challenge for issuing a wildcard cert requires the ClusterIssuer to automatically create a CNAME record at _acme-challenge.example.com each time the certificate is issued or renewed, we must also generate a Cloudflare API Key and store it in a Kubernetes secret in the same namespace as cert-manager.

Deploy the Helm chart for cert-manager in a new namespace for cert-manager as follows:

$ helm repo add jetstack https://charts.jetstack.io
$ helm repo update
$ helm upgrade cert-manager jetstack/cert-manager \
      --install \
      --create-namespace \
      --wait \
      --namespace cert-manager \
      --set installCRDs=true

Log in to the Cloudflare dashboard and generate an API token with the Zone.DNS.Edit and Zone.DNS.Read permissions. It is important to distinguish an API token, which allows setting “fine grained” permissions as opposed to an API key, which provides global access to the entire account.

Create Cloudflare API token for Kubernetes cert-manager

Create a Kubernetes Secret called cloudflare-api-token-secret in the cert-manager Namespace.

# cf-api-token.yaml

apiVersion: v1
kind: Secret
metadata:
  name: cloudflare-api-token-secret
  namespace: cert-manager
type: Opaque
stringData:
  api-token: <API token>

Then, create the ClusterIssuer by applying the following YAML file. The difference between an Issuer (as described in the cert-manager docs) and a ClusterIssuer is that all namespaces can use a ClusterIssuer to issue a Certificate, but an Issuer can only be referenced within the same namespace.

# clusterissuer-letsencrypt-cf.yaml

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-cf
  namespace: default
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: <email address>
    # Name of a secret used to store the ACME account private key
    privateKeySecretRef:
      name: letsencrypt-cf
    solvers:
    - selector: {}
      dns01:
        cloudflare:
          email: <email address>
          apiTokenSecretRef:
            name: cloudflare-api-token-secret
            key: api-token

Finally, issue the Let’s Encrypt wildcard certificate using cert-manager by applying this YAML to create the Certificate object.

# certificate-cf-prod.yaml 

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: <cert name>
  namespace: default
spec:
  secretName: <cert name>-tls-prod
  issuerRef:
    name: letsencrypt-cf
    kind: ClusterIssuer
  dnsNames:
  - '*.<example.com>'

You can check the issuance status by using the cmctl certificate status <cert name> command if you have it installed.

You should see output similar to below if the ClusterIssuer was able to access the Cloudflare API token through the Secret, create a CNAME record with the verification string provided by Let’s Encrypt’s ACME service, and successfully generate the certificate and store it in a Secret in the default namespace called <cert name>-tls-prod.

$ cmctl status certificate autoize-net
Name: autoize-net
Namespace: default
Created at: 2024-05-16T19:29:25+02:00
Conditions:
  Ready: True, Reason: Ready, Message: Certificate is up to date and has not expired
DNS Names:
- *.autoize.net
Events:  <none>
Issuer:
  Name: letsencrypt-cf
  Kind: ClusterIssuer
  Conditions:
    Ready: True, Reason: ACMEAccountRegistered, Message: The ACME account was registered with the ACME server
  Events:  <none>
Secret:
  Name: autoize-net-tls-prod
  Issuer Country: US
  Issuer Organisation: Let's Encrypt
  Issuer Common Name: R3
  Key Usage: Digital Signature, Key Encipherment
  Extended Key Usages: Server Authentication, Client Authentication
  Public Key Algorithm: RSA
  Signature Algorithm: SHA256-RSA
  Subject Key ID: REDACTED
  Authority Key ID: REDACTED
  Serial Number: REDACTED
  Events:  <none>
Not Before: 2024-05-16T23:24:57+02:00
Not After: 2024-08-14T23:24:56+02:00
Renewal Time: 2024-07-15T23:24:56+02:00
No CertificateRequest found for this Certificate

Install the NGINX Ingress Controller using Helm & create an Ingress to Open WebUI

The last step to take before Open WebUI will be made available publicly over a secure connection is installing the NGINX Ingress Controller and creating the Ingress object in Kubernetes.

All major browsers (Chrome, Firefox) require a site to be properly secured with an HTTPS connection in order to access features such as the microphone, which is required for Open WebUI’s speech-to-text feature for transcribing voice prompts to the AI model.

Open WebUI with microphone permission over secure connection with valid certificate

To install the NGINX Ingress Controller, deploy the provided Helm chart to the AKS cluster from your local machine with the Helm CLI installed.

$ export NAMESPACE=ingress-basic

$ helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
$ helm repo update
$ helm install ingress-nginx ingress-nginx/ingress-nginx \
  --create-namespace \
  --namespace $NAMESPACE \
  --set controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-health-probe-request-path"=/healthz \
  --set controller.service.externalTrafficPolicy=Local

As part of the Helm chart, a LoadBalancer resource will be deployed in your AKS cluster and an external IP address will be assigned to the “kubernetes” Azure load balancer. You may see <pending> while the IP is assigned. To check the status, invoke this command:

$ kubectl get services --namespace ingress-basic -o wide -w ingress-nginx-controller
NAME                       TYPE           CLUSTER-IP    EXTERNAL-IP   PORT(S)                      AGE     SELECTOR
ingress-nginx-controller   LoadBalancer   10.0.253.25   51.8.xx.xx    80:31720/TCP,443:31762/TCP   2d17h   app.kubernetes.io/component=controller,
app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx

From your domain’s DNS host (in this case, Cloudflare), create an A record pointing a subdomain (with the “orange cloud” for CDN unselected) to the EXTERNAL-IP associated with the LoadBalancer. For the example deployment, we are using owui.autoize.net.

Finally, apply this YAML to create a forwarding rule routing port 443 on the LoadBalancer’s External IP to port 3000 of the Open WebUI pod inside the Kubernetes cluster. Be sure to replace <example.com> with the actual domain name and <cert name> with the name previously specified when creating the Certificate resource.

# owui-ingress.yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: owui-ingress
spec:
  ingressClassName: nginx
  rules:
    - host: owui.<example.com>
      http:
        paths:
          - pathType: Prefix
            backend:
              service:
                name: open-webui
                port:
                  number: 3000
            path: /
  tls:
    - hosts:
      - owui.<example.com>
      secretName: <cert name>-tls-prod