Lead RAG & Data Infrastructure Engineer
Engineer secure, high-scale vector retrieval architectures and data ingestion pipelines to power private enterprise context injection engines.
Mission
Large language models are completely useless to an enterprise without accurate, secure, real-time context. Your job is to build the pipelines that feed them. As a Lead RAG & Data Infrastructure Engineer at Stackgrid, you will design the storage and retrieval layers that allow models to query millions of private, un-structured enterprise documents in milliseconds. You will be directly responsible for data sovereignty, ensuring zero data retention outside of ring-fenced client networks.
Responsibilities
Architect Vector Pipelines: Design and optimize high-throughput data ingestion pipelines that clean, chunk, embed, and index massive corpuses of unstructured enterprise data.
Optimize Retrieval Performance: Implement and test advanced RAG strategies (hierarchical node parsing, hybrid keyword/vector search, re-ranking models) to drive down context latency below 500ms.
Enforce Data Sovereignty: Deploy ring-fenced vector databases (Qdrant, Pinecone, pgvector) inside secure cloud environments (AWS Nitro Enclaves or client-managed VPCs) ensuring absolute isolation from public web scrapers.
Manage Embedding Latency: Monitor and optimize embedding compute costs and model drift, ensuring data indices remain dynamically synced with the clients' live production databases.
Requirements
Vector Database Mastery: Definitive, hands-on experience deploying and scaling vector databases in production environments handling millions of high-dimensional vectors.
Advanced Chunking Strategy: You understand that basic token-splitting doesn't work for complex corporate data. You must possess deep knowledge of semantic chunking, parent-child retrieval, and metadata filtering.
Infrastructure Pragmatism: Strong command of containerization (Docker) and enterprise cloud architecture (AWS/GCP). You care deeply about data privacy laws, compliance boundaries (SOC 2, GDPR), and encrypted storage pipelines.
Department
Data Architecture
Location
Remote
Compensation
$160,000 – $210,000 USD + Production Hand-off Bonuses