Upgrade Kaptain

Upgrade Kaptain on your cluster

Upgrade the existing Kaptain installation to a newer version.

Prerequisites

Before you begin, ensure you have:

  • Kaptain 1.2.0–1.1.0 installed on a Konvoy cluster.
  • The existing cluster meets the criteria listed in the installation documentation.

Upgrade Kaptain

WARNING: When upgrading from Kaptain 1.1.0 to 1.2.0, some resources are not cleaned up automatically during the upgrade and must be removed manually by running the following commands before the upgrade:

kubectl delete mutatingwebhookconfigurations.admissionregistration.k8s.io katib-mutating-webhook-config
kubectl delete validatingwebhookconfigurations.admissionregistration.k8s.io katib-validating-webhook-config
  • Ensure the following base addons that are needed by Kaptain are enabled in your Konvoy cluster:

    - configRepository: https://github.com/mesosphere/kubernetes-base-addons
      configVersion: stable-1.20-4.1.0
      addonsList:
        - name: istio
          enabled: true
        - name: dex
          enabled: true
        - name: cert-manager
          enabled: true
        - name: prometheus
          enabled: true
    
  • Ensure the Kaptain addon repository is present in your Konvoy cluster.yaml:

    - configRepository: https://github.com/mesosphere/kubeaddons-kaptain
      configVersion: stable-1.20-1.3.0
      addonsList:
        - name: knative
          enabled: true
    
  • Download the kubeflow-1.3.0_1.2.0.tgz tarball from the support website.

  • Upgrade Kaptain:

    kubectl kudo upgrade --instance kaptain --namespace kubeflow ./kubeflow-1.3.0_1.2.0.tgz
    
  • Monitor the upgrade process by running:

    kubectl kudo plan status --instance kaptain --namespace kubeflow
    

After the upgrade completes, log in to Kaptain:

  • Discover the cluster endpoint and copy it to the clipboard. If you are running Kaptain on-premises, use this command:

    kf_uri=$(kubectl get svc kubeflow-ingressgateway --namespace kubeflow -o jsonpath="{.status.loadBalancer.ingress[*].ip}") && echo "https://${kf_uri}"
    

    Or if you are running Kaptain on AWS, use this command:

    kf_uri=$(kubectl get svc kubeflow-ingressgateway --namespace kubeflow -o jsonpath="{.status.loadBalancer.ingress[*].hostname}") && echo "https://${kf_uri}"
    
  • Get the login credentials from Konvoy to authenticate:

    konvoy get ops-portal
    

Workloads behavior during the upgrades

  • When upgrading from Kaptain version 1.2.0-1.1.0 to 1.2.0 the following workloads do not require stopping and can proceed without interruption during the upgrade: Jupyter Notebooks, Training Jobs (TFJob, PyTorchJob, MXNetJob), Katib Experiments and Trials, and SparkApplications), Kubeflow Pipelines.