
- Audit & Validation: Conduct rigorous quality checks on scraped outputs from.
- Streamlit applications to ensure high-fidelity extraction from source documents.
- Data Remediation: Utilize purpose-built data pipelines to manually or programmatically overwrite inaccurate data points discovered during auditing.
- Pipeline Monitoring: Collaborate with data engineering teams to identify systemic scraping errors and refine the logic within the ingestion layer.
- Governance Integration: Transition successful document auditing workflows into. our broader enterprise data governance practices.
- Reporting: Maintain detailed logs of data discrepancies, "ground truth" comparisons,. and error trends to inform future scraping strategies.
Required Skills & Qualifications:
- Extreme Attention to Detail: You must have a passion for "hunting" for small discrepancies in large datasets.
- Snowflake Proficiency: Hands-on experience querying and managing data within.
- Snowflake is required.
- Strong SQL Skills: Ability to write complex queries to validate data across multiple tables and identify outliers.
- Analytical Mindset: Experience auditing unstructured data (PDFs, images, or web scrapes) and comparing it against structured outputs.
- Communication: Ability to clearly document data issues and explain technical discrepancies to both engineers and stakeholders.
Preferred Qualifications:
- Python Experience: Familiarity with Python for data manipulation (Pandas) or basic automation is a significant plus.
- Streamlit Familiarity: Understanding how Streamlit apps function to better troubleshoot how data is being captured.
- Governance Background: Prior experience working within a formal Data.
- Governance framework or using data cataloging tools.
Didn’t find the job appropriate? Report this Job