Lead RAG & Data Infrastructure Engineer

Engineer secure, high-scale vector retrieval architectures and data ingestion pipelines to power private enterprise context injection engines.

Mission

Large language models are completely useless to an enterprise without accurate, secure, real-time context. Your job is to build the pipelines that feed them. As a Lead RAG & Data Infrastructure Engineer at Stackgrid, you will design the storage and retrieval layers that allow models to query millions of private, un-structured enterprise documents in milliseconds. You will be directly responsible for data sovereignty, ensuring zero data retention outside of ring-fenced client networks.

Responsibilities
  • Architect Vector Pipelines: Design and optimize high-throughput data ingestion pipelines that clean, chunk, embed, and index massive corpuses of unstructured enterprise data.

  • Optimize Retrieval Performance: Implement and test advanced RAG strategies (hierarchical node parsing, hybrid keyword/vector search, re-ranking models) to drive down context latency below 500ms.

  • Enforce Data Sovereignty: Deploy ring-fenced vector databases (Qdrant, Pinecone, pgvector) inside secure cloud environments (AWS Nitro Enclaves or client-managed VPCs) ensuring absolute isolation from public web scrapers.

  • Manage Embedding Latency: Monitor and optimize embedding compute costs and model drift, ensuring data indices remain dynamically synced with the clients' live production databases.

Requirements
  • Vector Database Mastery: Definitive, hands-on experience deploying and scaling vector databases in production environments handling millions of high-dimensional vectors.

  • Advanced Chunking Strategy: You understand that basic token-splitting doesn't work for complex corporate data. You must possess deep knowledge of semantic chunking, parent-child retrieval, and metadata filtering.

  • Infrastructure Pragmatism: Strong command of containerization (Docker) and enterprise cloud architecture (AWS/GCP). You care deeply about data privacy laws, compliance boundaries (SOC 2, GDPR), and encrypted storage pipelines.

Department

Data Architecture

Location

Remote

Compensation

$160,000 – $210,000 USD + Production Hand-off Bonuses