
We are seeking an experienced NOC / Monitoring Lead to spearhead the implementation of our Observability project and develop a unified Network Operations Center (NOC) dashboard. This role will be responsible for integrating multiple monitoring systems-including CoC (Zybisys), Prometheus (Kambala), back-office platforms, and mobile applications-ensuring seamless visibility and proactive monitoring of critical business applications.
The ideal candidate will bring expertise in incident management, monitoring tool integrations, and service reliability, with a strong background in the banking or capital markets domain.
Key Responsibilities:
- Lead the design, implementation, and operation of the Observability project and unified NOC dashboard.
- Integrate monitoring tools and systems (CoC, Prometheus, back-office platforms, mobile applications) into a consolidated view for real-time tracking.
- Build and maintain end-to-end service maps, ensuring complete visibility into business-critical applications.
- Own incident detection, escalation, and remediation processes through the Command Center.
- Collaborate with internal teams, service providers, and the extended NOC team at Zybisys for issue resolution and continuous improvements.
- Establish monitoring KPIs, alert mechanisms, and proactive incident prevention strategies.
- Ensure high availability, performance, and resilience of financial systems in alignment with regulatory and business requirements.
- Provide leadership and guidance to the monitoring/NOC team, fostering a culture of accountability and continuous learning.
Required Qualifications & Skills:
- 8-12 years of experience in NOC, monitoring, or infrastructure operations roles, with at least 3 years in a leadership capacity.
- Proven expertise in implementing and managing observability platforms and NOC dashboards.
- Strong hands-on knowledge of monitoring tools (Prometheus, CoC/Zybisys, Grafana, or equivalent).
- Solid understanding of incident management, service reliability engineering, and ITIL processes.
- Prior experience in Banking, Capital Markets, or other financial services environments is highly preferred.
- Strong problem-solving and analytical skills with the ability to design resilient monitoring solutions.
- Excellent stakeholder management, communication, and vendor coordination skills.
Preferred:
- Experience with cloud monitoring (AWS, Azure, GCP).
- Familiarity with automation in incident detection and remediation.
- Certifications in ITIL, SRE, or related fields.
Didn’t find the job appropriate? Report this Job