Impetus Technologies
Job Description:
Key Responsibilities:
- Own the observability product roadmap with a focus on enabling visibility for data pipelines, distributed compute frameworks (e.g., Spark, Flink), and cloud-native workloads.
- Define and deliver features for metrics ingestion, distributed tracing, log processing pipelines, alerting, dashboards, and SLO/SLA tooling.
- Drive integration with cloud platforms (AWS, GCP, Azure), container orchestration systems (Kubernetes), and data infrastructure components (Kafka, Airflow, Snowflake, etc.
- Define APIs, data models, and storage strategies for telemetry data at scale.
- Collaborate with platform, SRE, and data engineering teams to understand pain points, gather requirements, and validate solutions.
- Contribute to the definition and tracking of service health indicators (SLIs/SLOs), incident response tooling, and automated root cause analysis.
- Stay current on emerging trends in observability (e.g., eBPF, AI/ML for anomaly detection, continuous profiling), cloud infrastructure, and big data ecosystems.
- Work with engineering to build scalable systems for telemetry collection, processing, retention, and visualization.
- Develop product specifications with clear technical detail for engineering execution.
Preferred Experience & Skills:
- 8+ years in technical product management, ideally with products related to observability, infrastructure, or data platforms.
- Hands-on experience with observability tools like OpenTelemetry, Prometheus, Grafana, Jaeger, ELK stack, Datadog, New Relic, or similar.
- Strong understanding of cloud-native architecture patterns, microservices, containers, and orchestration (especially Kubernetes).
- Experience with distributed systems and data platforms e.g., Apache Kafka, Apache Spark, Flink, Airflow, Presto, Snowflake, etc.
- Familiarity with infrastructure-as-code (e.g., Terraform, Helm) and CI/CD systems.
- Working knowledge of telemetry data storage and processing at scale (TSDBs, log indexing, event pipelines).
- Ability to read and communicate technical designs with engineers and stakeholders (e.g., API specs, sequence diagrams, data flows).
- Experience working with SREs, platform teams, or DevOps roles in production environments.
- Strong analytical skills; ability to define and monitor KPIs for performance, reliability, and user adoption.
Nice to Have:
- Background in data engineering or site reliability engineering (SRE).
- Experience with cost optimization and resource utilization tracking in cloud environments.
- Exposure to AI/ML-based anomaly detection and predictive analytics in observability.
- Experience contributing to or working with open-source observability communities.
Roles & Responsibilities:
Key Responsibilities:
- Own the observability product roadmap with a focus on enabling visibility for data pipelines, distributed compute frameworks (e.g., Spark, Flink), and cloud-native workloads.
- Define and deliver features for metrics ingestion, distributed tracing, log processing pipelines, alerting, dashboards, and SLO/SLA tooling.
- Drive integration with cloud platforms (AWS, GCP, Azure), container orchestration systems (Kubernetes), and data infrastructure components (Kafka, Airflow, Snowflake, etc.
- Define APIs, data models, and storage strategies for telemetry data at scale.
- Collaborate with platform, SRE, and data engineering teams to understand pain points, gather requirements, and validate solutions.
- Contribute to the definition and tracking of service health indicators (SLIs/SLOs), incident response tooling, and automated root cause analysis.
- Stay current on emerging trends in observability (e.g., eBPF, AI/ML for anomaly detection, continuous profiling), cloud infrastructure, and big data ecosystems.
- Work with engineering to build scalable systems for telemetry collection, processing, retention, and visualization.
- Develop product specifications with clear technical detail for engineering execution.
NOTE: We are hiring for Indore and Bangalore.
Didn’t find the job appropriate? Report this Job