AI Data Engineer
— · San Jose, US
Job description
About the job
With Agents transforming the data engineering landscape, the role of a “data engineer” is also being redefined. At Xterra, a data engineer in this new era is not just an ETL developer, but an architect and orchestrator of autonomous data systems.
Job Title: AI Data Engineer
Overview:
Xterra is seeking an innovative Data Engineer to build and manage an AI-driven data platform that powers real-world impact across geospatial and geophysics intelligence. This is not a typical ETL role – we’re looking for someone excited to harness AI agents and automation to create a self-optimizing data infrastructure. You will be at the forefront of designing systems where data pipelines monitor, heal, and adapt themselves, and where data quality is safeguarded through intelligent algorithms rather than manual checks. If you are passionate about data engineering and AI/ML, and you thrive in a fast-paced environment of cutting-edge technology, we want to talk to you!
Key Responsibilities:
- Design and Implement Autonomous Data Pipelines: Develop data ingestion and transformation workflows that leverage AI agents for schema inference, anomaly detection, and adaptive processing. You will set up a modular pipeline architecture where components (ingest, validation, enrichment, etc.) are orchestrated with minimal human intervention. Expect to integrate tools like streaming platforms, cloud data lakes, and ML models to achieve end-to-end automation.
- Build AI-Powered Data Preparation and Labeling Systems: Use large language models and other ML techniques to automate data preparation tasks. This includes deploying LLM-based labeling for text and documents, computer vision models for image annotation, and sensor data pattern recognition. You will ensure that raw data is quickly turned into high-quality, labeled datasets through a combination of AI and programmatic techniques – drastically reducing the need for manual data labeling.
- Implement Self-Healing and Monitoring Mechanisms: Create intelligent monitoring solutions that detect pipeline failures or data quality issues in real time. You will configure agents/tools to automatically triage and recover from errors (e.g. auto-retry jobs, backfill missing data, adjust for upstream schema changes). When human attention is required, design the alerting/notification processes so that issues are escalated with context (e.g., an agent-generated summary of the problem and suggested fixes). The goal is to achieve high pipeline reliability (99%+ uptime) with minimal manual firefighting, as the system handles most issues proactively.
- Data Infrastructure & Tooling Innovation: Stay at the cutting edge of data engineering and AI tooling. You will evaluate and integrate new technologies such as agent orchestration frameworks, prompt-based development tools, and data-centric ML libraries. You’ll work closely with our ML engineers to ensure the data platform serves model training and inference needs in a seamless, automated way.
- Human-in-the-Loop Workflow Orchestration: Develop mechanisms for human oversight and input in our AI-driven pipelines. For example, implement approval steps or review dashboards where data stewards can easily see what the AI agents are doing and intervene when necessary (especially for high-stakes data). You will ensure that governance, security, and compliance requirements are met by the autonomous system – including managing data access controls, privacy (for any sensitive data), and audit logs of automated actions. Designing this human+AI collaboration loop is key to scaling our platform safely.
- Cross-Functional Collaboration and Leadership: Work closely with data scientists, analysts, and domain experts (geoscientists, etc.) to understand data needs and pain points. You will act as a bridge between these teams and the data platform, translating business requirements into automated data solutions. Additionally, evangelize the capabilities of our AI-driven data engineering approach – help train and mentor other engineers on using AI tools, and guide stakeholders to trust and adopt the new workflows. As we onboard more autonomous features, you’ll be instrumental in change management and ensuring the team embraces the “new way” of working with data.
Preferred Qualifications:
- 5+ years of experience in data engineering or related field, with expertise in building data pipelines (ETL/ELT) on modern cloud platforms.
- Strong programming skills (Python, SQL, etc.) and familiarity with pipeline orchestration frameworks (e.g., Airflow, dbt) – and enthusiasm for augmenting them with AI/LLM-based tools.
- Experience with machine learning or AI integration in data processes. This could include using APIs for NLP/computer vision, deploying ML models in production data flows, or working with LLMs (e.g., prompt engineering, fine-tuning).
- Knowledge of data quality and data governance practices. Experience implementing monitoring, testing (Great Expectations or similar), or anomaly detection in pipelines is a plus.
- Exposure to tools for automated data labeling or weak supervision, and data-centric AI (e.g., Snorkel, Cleanlab) is highly desirable.
- Domain familiarity with geospatial or sensor data is a bonus (e.g., handling GIS data, time-series sensor analysis), as Xterra’s use cases involve these modalities.
- Creative, forward-thinking mindset – eager to push the envelope by designing systems that challenge the status quo. Must be comfortable with rapid prototyping and learning new technologies.
- Excellent communication skills to articulate complex AI/data engineering concepts to non-technical stakeholders and to document systems clearly.
What Success Looks Like: You will have guided Xterra’s data engineering into the future – delivering a robust platform where adding a new data source or pipeline is mostly a configuration matter, not a from-scratch project. Our data pipelines will largely run autonomously, achieving a 70–90% reduction in manual intervention and improving deployment speed for new pipelines from weeks to hours. Data consumers in the company will start to experience the convenience of asking questions and getting data or insights quickly via AI-powered tools. Overall, you’ll help transform the data team from spending most of their time on pipeline maintenance to focusing on higher-value analysis and innovation. This role is pivotal in making Xterra’s data infrastructure a competitive advantage, through the smart application of AI and automation in data engineering.
ML/AI Work links you to the employer's original posting — always verify the details there before applying.
More Domain Specializations roles
View all →AI & Automation Engineer
Freestone Capital Management · Washington, US
Emerging Tech Engineer
U.S. Bank · Atlanta, US
Matterport – Senior Machine Learning/Computer Vision Engineer – 3D Reconstruction and Semantic Understanding
CoStar Group · Remote · Oakland
Junior AI/ML Engineer
Talan · Geneva, CH
Forma framtidens medicinska innovation med avancerad AI – Nu söker Karolinska Institutet 2 nya AI Ingenjörer
Karolinska Institutet (KI) · Uppsala, SE
AI/ML Engineer
MAERSK · Copenhagen, DK