Recreate OKE Cluster from Scratch - Disaster Recovery
This guide walks through recreating the entire OKE cluster from scratch using this repository and secrets stored in OCI Vault.
Prerequisites
Section titled “Prerequisites”Before starting, ensure you have:
- OCI Account with Always Free eligibility
- Cloudflare Account with a managed domain
- GitHub Account with fork of this repository
- Local Tools: Terraform, OCI CLI, kubectl
Step 1: OCI CLI Setup
Section titled “Step 1: OCI CLI Setup”-
Install OCI CLI
Terminal window brew install oci-cli -
Configure authentication
Terminal window oci setup configThis creates
~/.oci/configwith your tenancy details. -
Verify connection
Terminal window oci iam user get --user-id <your-user-ocid>
Step 2: Retrieve Secrets from Vault
Section titled “Step 2: Retrieve Secrets from Vault”If recreating after a disaster, secrets are stored in OCI Vault:
# List all secrets in the vaultoci vault secret list \ --compartment-id <compartment-ocid> \ --query 'data[].{"name":"secret-name","id":id}' \ --output tableRetrieve Individual Secrets
Section titled “Retrieve Individual Secrets”# Generic retrieval commandoci secrets secret-bundle get \ --secret-id <secret-ocid> \ --query 'data."secret-bundle-content".content' \ --raw-output | base64 -dStep 3: Create terraform.tfvars
Section titled “Step 3: Create terraform.tfvars”Create the Terraform variables file with values from Vault:
cd tf-oke
cat > terraform.tfvars << 'EOF'# OCI Authentication (from ~/.oci/config or password manager)tenancy_ocid = "ocid1.tenancy.oc1..xxxxx"user_ocid = "ocid1.user.oc1..xxxxx"fingerprint = "xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx"private_key_path = "~/.oci/oci_api_key.pem"region = "us-ashburn-1"compartment_ocid = "ocid1.compartment.oc1..xxxxx"
# SSH (from Vault: ssh-public-key)ssh_public_key_path = "./oci_key.pub"
# Cloudflare (from Vault)cloudflare_api_token = "<from vault: cloudflare-api-token>"cloudflare_zone_id = "<from vault: cloudflare-zone-id>"domain_name = "<from vault: domain-name>"
# GitHub (from Vault)git_repo_url = "<from vault: git-repo-url>"git_pat = "<from vault: github-pat>"git_username = "<from vault: github-username>"git_email = "<your-email>"
# Let's Encrypt (from Vault)acme_email = "<from vault: acme-email>"
# ArgoCD (from Vault)argocd_admin_password = "<from vault: argocd-admin-password>"argocd_admin_password_hash = "<from vault: argocd-admin-password-hash>"EOFStep 4: Create SSH Key
Section titled “Step 4: Create SSH Key”If you don’t have the SSH key:
# Generate new key pairssh-keygen -t ed25519 -f ./oci_key -N ""
# Or retrieve from Vaultoci secrets secret-bundle get \ --secret-id <ssh-public-key-ocid> \ --query 'data."secret-bundle-content".content' \ --raw-output | base64 -d > oci_key.pubStep 5: Initialize Terraform
Section titled “Step 5: Initialize Terraform”terraform initterraform planterraform apply# State is in OCI Object Storage bucketterraform initterraform planStep 6: Wait for Provisioning
Section titled “Step 6: Wait for Provisioning”After Terraform completes, the OKE cluster will be active. You need to configure kubectl and install Argo CD manually.
-
Configure kubectl:
Terminal window oci ce cluster create-kubeconfig ... (see Accessing Cluster guide) -
Install Argo CD:
Terminal window kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yamlkubectl apply -f ../argocd/applications.yaml
Step 7: Verify Cluster
Section titled “Step 7: Verify Cluster”Check Kubernetes
Section titled “Check Kubernetes”kubectl get nodeskubectl get pods -ACheck ArgoCD Applications
Section titled “Check ArgoCD Applications”kubectl get applications -n argocdStep 8: Access ArgoCD UI
Section titled “Step 8: Access ArgoCD UI”Get Credentials
Section titled “Get Credentials”The ArgoCD admin password is synced from OCI Vault via External Secrets Operator.
kubectl -n argocd get secret argocd-secret \ -o jsonpath='{.data.admin\.password}' | base64 -dThis retrieves the password that External Secrets synced from Vault.
oci secrets secret-bundle get \ --secret-id <argocd-admin-password-ocid> \ --query 'data."secret-bundle-content".content' \ --raw-output | base64 -dargocd login argocd.<your-domain> \ --username admin \ --password <password-from-above>Troubleshooting
Section titled “Troubleshooting”State Lock Issues
Section titled “State Lock Issues”If Terraform state is locked from a previous run:
terraform force-unlock <lock-id>ArgoCD Sync Issues
Section titled “ArgoCD Sync Issues”If applications aren’t syncing, check the repo credentials:
kubectl -n argocd get secret repo-creds -o yamlDNS Not Resolving
Section titled “DNS Not Resolving”Wait for External DNS to create records (up to 5 minutes), then verify:
dig @1.1.1.1 argocd.<your-domain>Known Issues After Recreation
Section titled “Known Issues After Recreation”Let’s Encrypt Rate Limiting
Section titled “Let’s Encrypt Rate Limiting”If certificates fail with 429 rateLimited error, you can:
- Wait for the rate limit to reset (7 days)
- Create a temporary self-signed certificate:
Terminal window openssl req -x509 -nodes -days 365 -newkey rsa:2048 \-keyout /tmp/tls.key -out /tmp/tls.crt \-subj "/CN=<your-domain>"kubectl create secret tls docs-tls \--cert=/tmp/tls.crt --key=/tmp/tls.key \-n default --dry-run=client -o yaml | kubectl apply -f -
Envoy Gateway Pod Restart
Section titled “Envoy Gateway Pod Restart”If Envoy pods are stuck in Pending state after a restart:
# Find and delete the old pod to free hostPort 80/443kubectl delete pod -n envoy-gateway-system <old-pod-name> --grace-period=10Post-Recreation Checklist
Section titled “Post-Recreation Checklist”- All nodes are Ready (
kubectl get nodes) - Worker node joined (check for TLS errors if missing)
- ArgoCD applications are Synced
- DNS records are created (wait up to 5 minutes)
- TLS certificates are issued (check for rate limiting)
- Applications are accessible via HTTPS
- Envoy Gateway pod is Running (not Pending)