
Description:
Key Responsibilities:
Cloud Platform Architecture:
- Design and implement a scalable cloud platform covering compute, storage, and networking layers.
- Define architecture for multi-cluster Kubernetes environments, ensuring high availability, scalability, and security.
- Build core services such as identity & access management, service discovery, observability, and API gateways.
Networking:
- Architect multi-tenant networking for VPC/VNet equivalents, load balancers, firewalls, and service meshes.
- Implement SDN solutions (Calico, Cilium, OVN, etc.) and network policy enforcement at scale.
- Optimize inter-cluster and inter-datacenter connectivity.
Storage:
- Design and manage distributed storage solutions (Ceph, Rook, OpenEBS, MinIO, Lustre).
- Architect persistent storage for Kubernetes (CSI drivers, snapshots, backup/restore).
- Ensure data availability, durability, and compliance with SLAs.
Kubernetes & Orchestration:
- Design multi-tenant Kubernetes platforms with advanced scheduling, security, and RBAC.
- Automate provisioning, scaling, and upgrades using operators, Helm, and GitOps (ArgoCD/Flux).
- Integrate with monitoring/logging (Prometheus, Grafana, Loki, ELK).
Automation & Infrastructure-as-Code:
- Implement full stack automation with Terraform, Ansible, or Pulumi.
- Drive CI/CD pipelines for infrastructure and application delivery.
- Build self-service capabilities for internal teams.
Security & Compliance:
- Design security at all layers (network, storage, workloads).
- Implement secrets management (Vault, External Secrets, KMS).
- Ensure compliance with data governance and regulatory requirements.
Leadership:
- Collaborate with product and engineering teams to define roadmap and priorities.
- Mentor and guide platform engineers and DevOps teams.
- Evaluate new technologies and contribute to open-source where applicable.
Required Skills & Experience:
Networking: Deep knowledge of TCP/IP, routing, load balancing, DNS, SDN (Calico, Cilium, Istio/Linkerd).
Storage: Hands-on with distributed storage (Ceph, MinIO, Gluster, Rook) and Kubernetes storage orchestration (CSI).
Kubernetes: 5+ years experience, expert in multi-cluster deployments, operators, controllers, service mesh.
Cloud & Infra: Strong background in virtualization (KVM, VMware, OpenStack) and bare-metal automation (MAAS, Ironic, PXE, IPMI/Redfish).
IaC & Automation: Proficiency in Terraform, Ansible, GitOps tools (ArgoCD, Flux).
CI/CD: Experience with Jenkins, GitHub Actions, GitLab CI/CD.
Programming/Scripting: Proficiency in Go, Python, or Bash.
Monitoring/Observability: Prometheus, Grafana, Loki, ELK, Jaeger.
- Strong knowledge of distributed systems, high availability, and fault tolerance.
Preferred Qualifications:
- Experience designing cloud platforms at scale (e.g., internal private cloud, hyperscaler background).
- Contributions to open-source Kubernetes ecosystem (CNCF projects).
- Familiarity with service billing, quota management, and multi-tenancy at scale.
- Exposure to bare-metal cloud orchestration (Metal3, Tinkerbell, Equinix Metal, Ironic).
- Strong leadership and architectural decision-making skills.
Didn’t find the job appropriate? Report this Job