HamburgerMenu
iimjobs
Job Views:  
210
Applications:  30
Recruiter Actions:  15

Posted in

IT & Systems

Job Code

1609501

Director - Site Reliability Engineering

Posted 2 months ago
Posted 2 months ago

Job Description:

As a Site Reliability Engineering (SRE) leader, you will be responsible for leading a team of 10 to 18 SREs to ensure the reliability, scalability, and performance of the platform. This includes managing the team, defining and tracking key metrics, collaborating with other engineering teams, and driving improvements to the development and production environment. The role also involves implementing best practices, staying abreast of industry trends, and building automation to support large-scale deployments.

Has to setup the vision and roadmap for the SRE team.


Required skills and experience:


- Proven experience in managing high-performing engineering teams.


- Experience with large infrastructure and distributed systems.


- Strong understanding of AWS cloud computing infrastructure and its components.


- Experience with CI/CD pipelines, Kubernetes, and monitoring at scale.


- Proficiency in Infrastructure as Code (Terraform).


- Experience with configuration management tools like Ansible, Chef, or Puppet.


- Strong communication and stakeholder management skills.


- Knowledge of Data Pipeline, MongoDB, ElasticSearch, Kafka, Spark, Samza is an advantage.


Didn’t find the job appropriate? Report this Job

Job Views:  
210
Applications:  30
Recruiter Actions:  15

Posted in

IT & Systems

Job Code

1609501

UPSKILL YOURSELF

My Learning Centre

Explore CoursesArrow