Posted by
Banashree
Talent Acquisition Manager at Open network digital commerce (ONDC)
Last Active: 01 December 2025
Posted in
IT & Systems
Job Code
1631563

About the Role:
- We are looking for an exceptional Data Architect to help design, scale, and optimize the next generation of India's open-commerce data infrastructure. At ONDC, we process 10 TB of data daily, maintain 1 billion+ live entities, and handle 300K+ requests per second - operating at true internet scale.
- You will work at the heart of this system, shaping data platforms that meet all the three Vs of big data: volume, velocity, and variety.
Key Responsibilities:
Architect & Evolve Modern Data Platforms:
- Design, optimize, and scale data lakehouse architectures across multi-cloud environments (AWS, GCP, Azure), supporting both ad-hoc analytics and batch workloads.
Medallion & Lakehouse Design:
- Implement and evolve bronze-silver-gold (medallion) data pipelines using columnar formats (Parquet, ORC, Iceberg) ensuring cost-efficient, schema-evolution-friendly storage.
Data Discovery & Schema Governance:
- Build and maintain central schema repositories and data discovery services leveraging technologies like AWS Glue, Hive Metastore, Apache Iceberg, and Delta Catalogs.
Streaming Architecture:
- Architect and deploy real-time data streaming frameworks using Kafka or Pulsar, ensuring low-latency data flow, schema validation, and replayability.
Performance & Cost Optimization:
- Identify and implement cost-saving measures across data pipelines - data compression, columnar optimization, storage tiering, and query performance tuning.
Data Orchestration & Workflow Automation:
- Build and maintain orchestration frameworks using Airflow, Dagster, or Argo Workflows, ensuring observability, failure recovery, and lineage tracking.
OLAP Systems & Query Acceleration:
- Design and tune analytical workloads using Snowflake, Redshift, ClickHouse, or Druid, supporting large-scale aggregation and real-time exploration.
DevOps & Infrastructure as Code:
- Collaborate with DevOps teams to define reproducible infrastructure using Terraform / CloudFormation / Pulumi. Deploy, monitor, and optimize Kubernetes-based data services.
Natural Language Querying:
- Contribute to frameworks enabling natural-language analytics, integrating LLM-powered question-answer systems over structured data.
Required Skills & Experience
- 8+ years of experience in data architecture, platform engineering, or distributed systems.
- Proven expertise in Spark, Snowflake, Redshift, and SQL-based data modeling.
- Hands-on experience with streaming (Kafka/Pulsar) and batch processing frameworks.
- Deep understanding of cloud-native and cloud-agnostic architectures.
- Practical experience implementing lakehouse / medallion models.
- Strong grasp of data lineage, cataloging, governance, and schema evolution.
- Exposure to columnar formats (Parquet/ORC) and query engines (Presto/Trino/DuckDB).
- Familiarity with Kubernetes, Docker, and microservices-based data deployment.
- Excellent problem-solving, documentation, and cross-team collaboration skills.
Preferred Qualifications
- Experience in designing central schema and data discovery layers at scale.
- Prior exposure to large-scale public data networks (e.g., e-commerce, fintech, telecom).
- Understanding of AI/LLM-driven data access or semantic query layers.
- Contributions to open-source data infrastructure projects (nice to have).
Why Join Us
- Shape the data backbone of India's digital commerce revolution.
- Work on massive-scale systems (10 TB/day, 1B+ entities, 300K RPS).
- Collaborate with leading data engineers, architects, and policymakers.
- Innovate at the intersection of data engineering, AI, and open networks.
Didn’t find the job appropriate? Report this Job
Posted by
Banashree
Talent Acquisition Manager at Open network digital commerce (ONDC)
Last Active: 01 December 2025
Posted in
IT & Systems
Job Code
1631563