About the Role:

- We are looking for a Lead AI Engineer who blends deep technical expertise with engineering leadership. This is a builder leader role: you will architect production grade AI systems, write and review code on critical paths, and grow a team of engineers delivering Generative AI, agentic AI and applied machine learning solutions for enterprise clients.

- You will own the AI architecture decisions that span data, models, orchestration, infrastructure and governance. You will partner with product managers, solution architects, client stakeholders and platform teams to translate business problems into reliable, cost effective and responsible AI systems. You will also set the engineering bar through code reviews, design reviews, mentorship and hiring.

This is the right role for an engineer who has already shipped multiple AI systems in production, has scars from real world failure modes, and is now ready to lead architecture and people while staying close to the code.

Key Responsibilities:

1. Architecture and System Design

- Design end to end AI architectures spanning data ingestion, model serving, orchestration, observability and governance

- Make and document technology choices with clear trade off analysis: open versus closed models, RAG versus fine tuning, single agent versus multi agent, real time versus batch

- Define non functional requirements: latency budgets, throughput, cost per request, accuracy thresholds, uptime targets and graceful degradation paths

- Lead architecture review boards, write Architecture Decision Records (ADRs), and own the long term technical roadmap for AI capabilities

- Design for scale, multi tenancy, security and compliance from day one, not as an afterthought

- Evaluate emerging AI technologies and decide what to adopt, what to pilot and what to ignore

2. Hands On Engineering and Delivery

- Build and ship production AI systems: GenAI applications, RAG pipelines, agentic workflows, fine tuned models, classical ML services

- Write production grade Python code, contribute to critical path components, and stay credible as a senior engineer

- Review code and designs for the team, raise the engineering bar, and be the final reviewer for high risk changes

- Drive evaluation and observability practices: golden datasets, regression suites, LLM as judge frameworks, tracing, metrics and dashboards

- Optimize for cost, latency and quality through caching, batching, quantization, model routing and prompt optimization

- Own production reliability for AI services: on call rotation, incident response and post mortems

3. Generative AI, Agents and Applied ML

- Lead the design of GenAI applications across LLMs such as Claude, GPT, Gemini, Llama and Mistral, including the decision logic for model selection per use case

- Architect RAG systems with rigorous retrieval evaluation, chunking strategies, hybrid search and reranking

- Design and implement agentic systems using frameworks like LangGraph, CrewAI, AutoGen, Claude Agent SDK or MCP based architectures

- Drive fine tuning, instruction tuning and adapter strategies (LoRA, QLoRA, PEFT) when use cases justify them

- Establish guardrails, prompt injection defenses, PII handling and red teaming as standard practice

4. Team Handling and People Leadership

- Lead a team of 4 to 8 engineers including AI engineers, ML engineers and data scientists

- Run sprint planning, design reviews, one on ones and quarterly performance conversations

- Set clear technical and career growth plans for each direct report, and actively coach them

- Hire, onboard and retain strong engineers; design and run technical interviews

- Foster a culture of ownership, learning, blameless post mortems and shipping with quality

- Resolve technical disagreements with structured decision making, not seniority

- Represent the team in cross functional forums and shield the team from unnecessary thrash

5. Stakeholder and Client Engagement

- Partner with product managers and solution architects to translate ambiguous business problems into shippable AI scope

- Lead solution scoping, estimation and proposal work for AI engagements

- Present architectures, trade offs and progress to executive and client audiences in plain language

- Manage stakeholder expectations on what GenAI can and cannot reliably do, and push back on hype driven asks

- Contribute to pre sales by shaping demos, POCs and reference architectures

6. AI Governance, Safety and Responsible AI

- Embed responsible AI practices: bias evaluation, explainability, model cards, data cards and audit trails

- Implement guardrails for content safety, PII protection and prompt injection mitigation

- Ensure compliance posture aligned with frameworks such as EU AI Act, NIST AI RMF, ISO 42001, SOC2, HIPAA or GDPR as relevant to the engagement

- Define and own evaluation frameworks for safety, robustness and quality regressions

Required Qualifications:

Experience:

- 8 to 12 years of total software and data engineering experience

- 5+ years in applied AI/ML, with multiple systems shipped to production

- 2+ years of formal or informal team leadership, including direct mentoring of 3 or more engineers

- Demonstrated ownership of at least one AI system from design to production to scale

Technical Skills:

Programming and Engineering Foundations:

- Expert level Python, with strong software engineering fundamentals: testing, code structure, performance and debugging

- Solid grasp of data structures, algorithms, concurrency and distributed systems concepts

- Production experience with FastAPI or equivalent, REST and gRPC API design

- Comfort with at least one of Java, Go or TypeScript for cross team work

System Design and Architecture:

- Strong system design skills covering high availability, horizontal scale, caching strategies, queueing, idempotency and consistency models

- Experience designing event driven architectures using Kafka, Kinesis or equivalent

- Microservices and API design experience, including versioning, backward compatibility and SLA contracts

- Database design across SQL (PostgreSQL, MySQL) and NoSQL (DynamoDB, MongoDB, Redis)

- Vector database design and tuning: Pinecone, Weaviate, Qdrant, Chroma, FAISS, pgvector or Milvus

- Familiarity with security primitives: authentication, authorization, secrets management, network segmentation

AI and Machine Learning:

- Deep experience with LLMs and GenAI: RAG, prompt engineering, fine tuning, evaluation and cost optimization

- Hands on experience with at least one agentic framework: LangGraph, CrewAI, AutoGen, Claude Agent SDK, OpenAI Agents SDK or Semantic Kernel

- Experience with classical ML: regression, classification, clustering, ensemble methods using scikit learn, XGBoost or LightGBM

- Experience with deep learning frameworks: PyTorch (preferred) or TensorFlow

- Hands on with LLM tooling: LangChain, LlamaIndex, Hugging Face, vLLM or Ollama

MLOps, LLMOps and Cloud:

- Production MLOps experience: MLflow, Kubeflow, SageMaker, Vertex AI or Azure ML

- LLM observability and evaluation tooling: Langfuse, LangSmith, Arize, Helicone or equivalent

- Containerization and orchestration: Docker and Kubernetes in production

- CI/CD for ML: automated testing, model registry, canary and shadow deployments

- Strong cloud experience on at least one of AWS, Azure or GCP, including GPU based workloads

- Infrastructure as code: Terraform, CloudFormation or Pulumi

Data Engineering:

- Production data pipeline experience: Airflow, Dagster, Prefect or dbt

- Distributed data processing: Spark, Flink or Ray

- Data warehouse and lakehouse exposure: Snowflake, BigQuery, Databricks, Delta Lake or Iceberg

- Streaming systems: Kafka, Kinesis or Pub/Sub

Leadership and Soft Skills:

- Proven ability to lead engineers through ambiguity, set direction and drive delivery

- Strong written and verbal communication, comfortable presenting to executives and clients

- Sound judgment on when to ship, when to refactor, and when to throw away

- Bias for action paired with disciplined risk management

- Experience with Agile delivery models, sprint planning and engineering metrics

Education:

- Bachelor's or Master's degree in Computer Science, AI, Data Science, Mathematics, Engineering or related field

- Equivalent practical experience considered for exceptional candidates

Preferred Qualifications:

- Experience leading AI engagements in a consulting or services environment

- Open source contributions, technical blog posts, conference talks or patents in AI/ML

- Experience building reusable AI platforms or frameworks adopted across multiple teams

- Cloud certifications: AWS ML Specialty, Azure AI Engineer, GCP ML Engineer or Databricks ML

- Experience with Model Context Protocol (MCP), tool calling standards and emerging agent protocols

- Domain depth in BFSI, healthcare, retail, manufacturing or another regulated vertical

- Hands on with cost optimization for LLM workloads at meaningful scale