
Data Scientist
Experience: 3 Years
Required Skills:
Core Technical Competencies:
- Advanced Python Programming: Expertise in Python with production-level code quality, including OOP, API development, and best practices (linting, testing, documentation)
- Machine Learning Mastery: Deep understanding and practical application of:
- Classical ML algorithms (Random Forests, Gradient Boosting, SVM, clustering techniques)
- Deep Learning frameworks (TensorFlow, Keras, PyTorch)
- Time series forecasting and anomaly detection
- Model evaluation, validation, and optimization techniques
- Data Engineering: Experience with data pipelines, ETL processes, and handling large-scale datasets (TB+ scale)
- Cloud Platforms: Hands-on deployment experience with at least one major cloud platform (AWS, Azure, GCP), including:
- Managed ML services (SageMaker, Azure ML, Vertex AI)
- Containerization and orchestration (Docker, Kubernetes)
- Serverless architectures for ML deployment
- Any or both of the NLP / ML Engineering skillsets is applicable.
NLP & Text Analytics:
- Experience with modern NLP techniques including transformer models (BERT, GPT)
- Text preprocessing, feature extraction, and representation learning
- Practical applications: sentiment analysis, document classification, named entity recognition
- Working knowledge of NLP libraries (NLTK, spaCy, Hugging Face Transformers)
- ML Engineering & Production Systems
- MLOps practices: model versioning, monitoring, and automated retraining
- Building scalable ML pipelines and APIs (FastAPI, Flask)
- Experience with distributed computing frameworks (Spark/PySpark)
- Performance optimization and model compression techniques
Desired Skills:
Advanced AI/ML Capabilities:
- Generative AI & LLMs: Experience with LangChain, RAG architectures, prompt engineering, and fine-tuning large language models
- Computer Vision: Document AI, OCR technologies, image classification using CNNs/YOLO
- Recommendation Systems: Collaborative filtering, content-based filtering, hybrid approaches
- Advanced Analytics: Causal inference, A/B testing, experimental design
Technical Stack:
- Big Data Tools: PySpark, Dask, or similar distributed computing frameworks
- Visualization: Creating impactful dashboards using Tableau, Power BI, or Python libraries (Plotly, Dash)
- Version Control & CI/CD: Git workflows, automated testing, and deployment pipelines
- Database Systems: SQL proficiency, experience with NoSQL databases, vector databases
Didn’t find the job appropriate? Report this Job