Skip to content

Recreate OKE Cluster from Scratch - Disaster Recovery

This guide walks through recreating the entire OKE cluster from scratch using this repository and secrets stored in OCI Vault.

Before starting, ensure you have:

  • OCI Account with Always Free eligibility
  • Cloudflare Account with a managed domain
  • GitHub Account with fork of this repository
  • Local Tools: Terraform, OCI CLI, kubectl
  1. Install OCI CLI

    Terminal window
    brew install oci-cli
  2. Configure authentication

    Terminal window
    oci setup config

    This creates ~/.oci/config with your tenancy details.

  3. Verify connection

    Terminal window
    oci iam user get --user-id <your-user-ocid>

If recreating after a disaster, secrets are stored in OCI Vault:

Terminal window
# List all secrets in the vault
oci vault secret list \
--compartment-id <compartment-ocid> \
--query 'data[].{"name":"secret-name","id":id}' \
--output table
Terminal window
# Generic retrieval command
oci secrets secret-bundle get \
--secret-id <secret-ocid> \
--query 'data."secret-bundle-content".content' \
--raw-output | base64 -d

Create the Terraform variables file with values from Vault:

Terminal window
cd tf-oke
cat > terraform.tfvars << 'EOF'
# OCI Authentication (from ~/.oci/config or password manager)
tenancy_ocid = "ocid1.tenancy.oc1..xxxxx"
user_ocid = "ocid1.user.oc1..xxxxx"
fingerprint = "xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx"
private_key_path = "~/.oci/oci_api_key.pem"
region = "us-ashburn-1"
compartment_ocid = "ocid1.compartment.oc1..xxxxx"
# SSH (from Vault: ssh-public-key)
ssh_public_key_path = "./oci_key.pub"
# Cloudflare (from Vault)
cloudflare_api_token = "<from vault: cloudflare-api-token>"
cloudflare_zone_id = "<from vault: cloudflare-zone-id>"
domain_name = "<from vault: domain-name>"
# GitHub (from Vault)
git_repo_url = "<from vault: git-repo-url>"
git_pat = "<from vault: github-pat>"
git_username = "<from vault: github-username>"
git_email = "<your-email>"
# Let's Encrypt (from Vault)
acme_email = "<from vault: acme-email>"
# ArgoCD (from Vault)
argocd_admin_password = "<from vault: argocd-admin-password>"
argocd_admin_password_hash = "<from vault: argocd-admin-password-hash>"
EOF

If you don’t have the SSH key:

Terminal window
# Generate new key pair
ssh-keygen -t ed25519 -f ./oci_key -N ""
# Or retrieve from Vault
oci secrets secret-bundle get \
--secret-id <ssh-public-key-ocid> \
--query 'data."secret-bundle-content".content' \
--raw-output | base64 -d > oci_key.pub
Terminal window
terraform init
terraform plan
terraform apply

After Terraform completes, the OKE cluster will be active. You need to configure kubectl and install Argo CD manually.

  1. Configure kubectl:

    Terminal window
    oci ce cluster create-kubeconfig ... (see Accessing Cluster guide)
  2. Install Argo CD:

    Terminal window
    kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
    kubectl apply -f ../argocd/applications.yaml
Terminal window
kubectl get nodes
kubectl get pods -A
Terminal window
kubectl get applications -n argocd

The ArgoCD admin password is synced from OCI Vault via External Secrets Operator.

Terminal window
kubectl -n argocd get secret argocd-secret \
-o jsonpath='{.data.admin\.password}' | base64 -d

This retrieves the password that External Secrets synced from Vault.

Terminal window
argocd login argocd.<your-domain> \
--username admin \
--password <password-from-above>

If Terraform state is locked from a previous run:

Terminal window
terraform force-unlock <lock-id>

If applications aren’t syncing, check the repo credentials:

Terminal window
kubectl -n argocd get secret repo-creds -o yaml

Wait for External DNS to create records (up to 5 minutes), then verify:

Terminal window
dig @1.1.1.1 argocd.<your-domain>

If certificates fail with 429 rateLimited error, you can:

  1. Wait for the rate limit to reset (7 days)
  2. Create a temporary self-signed certificate:
    Terminal window
    openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
    -keyout /tmp/tls.key -out /tmp/tls.crt \
    -subj "/CN=<your-domain>"
    kubectl create secret tls docs-tls \
    --cert=/tmp/tls.crt --key=/tmp/tls.key \
    -n default --dry-run=client -o yaml | kubectl apply -f -

If Envoy pods are stuck in Pending state after a restart:

Terminal window
# Find and delete the old pod to free hostPort 80/443
kubectl delete pod -n envoy-gateway-system <old-pod-name> --grace-period=10
  • All nodes are Ready (kubectl get nodes)
  • Worker node joined (check for TLS errors if missing)
  • ArgoCD applications are Synced
  • DNS records are created (wait up to 5 minutes)
  • TLS certificates are issued (check for rate limiting)
  • Applications are accessible via HTTPS
  • Envoy Gateway pod is Running (not Pending)