Skip to main content

Troubleshooting

Common symptoms, likely causes, and remediation steps.

Pod in CrashLoopBackOff:

  • Check logs: kubectl logs <pod> -n <ns> --previous
  • Common causes: missing HDFS variables, invalid image, insufficient Kubernetes permissions

Cannot connect to HDFS:

  • Verify HDFS_URL and DNS resolution of the NameNode
  • Test connectivity from a debug pod: kubectl run -it debug --image=busybox --restart=Never -- nslookup namenode
  • Firewalls/NetworkPolicies: open required ports (NameNode/DataNodes)

Dynamic provisioning fails (PVC Pending):

  • Ensure the StorageClass provisioner matches the driver
  • Check events: kubectl describe pvc <name> and kubectl describe sc <name>
  • Inspect logs of csi-provisioner and controller

Volume not mounted in the Pod:

  • kubectl describe pod <pod> to see mount errors
  • Check csi-node-driver-registrar and the Node service
  • Validate HDFS permissions on the target path

HDFS permission errors:

  • Confirm configured HDFS user (HDFS_USER)
  • Adjust ACL/ownership on directories

Poor performance:

  • Check network latency and bandwidth to HDFS
  • Tune CPU/Memory requests/limits and Java GC if needed
  • Avoid excessive small files (anti-pattern in HDFS)

Kerberos issues (secured clusters):

  • Mount keytabs and krb5.conf via Secrets/volumes
  • Validate ticket acquisition (kinit) in a debug pod

Information to collect:

  • kubectl get pods,svc,deploy,ds -n <ns>
  • kubectl logs of CSI sidecars (provisioner, attacher, livenessprobe)
  • Kubernetes events and system metrics