THE COMPANY:
STACK INFRASTRUCTURE (STACK) provides digital infrastructure to scale the world’s most innovative companies. We are an award-winning industry leader in building, owning, and operating highly efficient, cost-effective wholesale, colocation, and cloud data centers. Each of our national facilities meets or exceeds the highest industry standards in all operational categories of availability, security, connectivity, and physical resilience.
STACK offers the scale and geographic reach that rapidly growing hyperscale and enterprise companies need. The world runs on data. Data runs on STACK.
THE COMPANY
STACK INFRASTRUCTURE (STACK) provides digital infrastructure to scale the world’s most innovative companies. We are an award-winning industry leader in building, owning, and operating highly efficient, cost-effective wholesale, colocation, and cloud data centers. Each of our national facilities meets or exceeds the highest industry standards in all operational categories of availability, security, connectivity, and physical resilience.
STACK offers the scale and geographic reach that rapidly growing hyperscale and enterprise companies need. The world runs on data. Data runs on STACK.
THE POSITION
STACK is seeking an AI Solution Architect to serve as the senior technical authority and delivery lead for enterprise AI solutions. This role owns the end-to-end design, engineering, and deployment of AI systems — spanning generative AI, LLM engineering, RAG pipelines, smart routing, agentic workflows, enterprise system integrations, intelligent layer development on Databricks, and production AI infrastructure — while driving technical adoption across the organization.
The AI Solution Architect is the builder and owns the data architecture, the Python code, the integrations, the GitLab repositories, the deployment pipelines, and the entire intelligent layer with the technical reliability of every AI solution in production. This role owns the AI system architecture and engineering layer that sits after the data foundation layer all the way to business outcome. This role reports directly to the Head of AI & Data Strategy and serves as the primary technical engineering authority for all AI solution delivery across the enterprise.
KEY RESPONSIBILITIES
Generative AI, LLM Engineering & Intelligent Routing

Design, develop, and deploy production GenAI solutions including custom assistants, multi-agent systems, and LLM-powered workflow automation — with hands-on ownership of every layer from prompt design through inference endpoint.
Architect and implement smart LLM routing logic — designing multi-model routing systems that dynamically select the right model based on query complexity, cost thresholds, latency requirements, and data residency constraints. Implement fallback chains, load balancing, and model arbitration patterns for production reliability.
Build and optimize Retrieval-Augmented Generation (RAG) pipelines end to end — including document ingestion strategy, chunking and overlap logic, embedding model selection and tuning, vector store architecture, hybrid retrieval design, reranking layers, and context window management for enterprise knowledge applications.
Engineer and maintain prompt libraries, system prompts, chain-of-thought patterns, and prompt chaining logic for all production LLM applications — maintaining a versioned prompt registry in GitLab with structured testing and evaluation before promotion.
Build LLM fine-tuning and alignment pipelines on Databricks MLflow — including supervised fine-tuning (SFT), parameter-efficient fine-tuning (LoRA, QLoRA), and RLHF.
Implement guardrails, content filtering, hallucination detection, and output validation layers to ensure AI responses meet safety, accuracy, and compliance standards established by the governance framework.

Predictive & Prescriptive AI Engineering

Implement and deploy predictive AI solutions that surface forecasts and recommendations to business users — building on feature specifications and model evaluation criteria provided by the AI Data Scientist.
Engineer the prescriptive AI layer — translating model outputs and recommendations into automated decisions, workflow triggers, and user-facing actions within business systems.
Build model serving infrastructure: inference endpoints, prediction APIs, caching layers, and fallback logic that ensure production models meet latency, reliability, and cost requirements.
Integrate AI model outputs into operational workflows, automated alerts, and decision support tools so insights reach end users in context and at the right moment.

Agentic AI Architecture & Multi-Agent Systems

Design and build the full agentic AI architecture for Phase III of the enterprise AI roadmap — including agent orchestration layers, tool registries, memory systems (short-term, long-term, episodic), state management, and inter-agent communication protocols.
Architect multi-agent workflows that span enterprise systems — designing how agents interact with NetSuite, Procore, SharePoint, Workday, and operational platforms to complete complex multi-step business processes autonomously.
Build and maintain the agent tool library — the set of callable Python functions, REST API wrappers, and enterprise connectors that agentic systems invoke to read data, trigger actions, and write outputs to business systems.
Implement intelligent routing within agentic systems — designing intent classification layers, task routing logic, and dynamic tool selection patterns that direct agent actions to the right model, tool, or human escalation path based on query type and confidence.
Own the full agentic AI lifecycle from design through production scaling — this is not a handoff function, it is an end-to-end engineering ownership responsibility.

Databricks Intelligence Layer & Python Engineering

Own the AI intelligence layer on Databricks — building, maintaining, and iterating all AI and ML workloads including model training notebooks, inference pipelines, feature transformation jobs, embedding generation logic, and LLM orchestration workflows running on Databricks ML Runtime.
Write production-grade Python code across all AI solution engineering work — including Azure Functions for serverless AI triggers, custom API wrappers, LLM chain implementations, agent tool functions, data transformation scripts, and automation pipelines connecting AI systems to enterprise platforms.
Build and maintain Databricks Jobs and Workflows orchestrating multi-step AI pipelines — coordinating data preparation, model inference, output transformation, prompt execution, and downstream delivery in a single governed execution context.
Leverage Databricks Mosaic AI capabilities — including AI Playground, Model Serving, Vector Search, and LakeFlow Connect.
Optimize Databricks AI workload performance — tuning notebook compute, cluster sizing for training jobs, and serving endpoint configuration — in coordination with the Sr. Data Platform Engineer who owns cluster policy and platform infrastructure.

GitLab, CI/CD & AI Solution DevOps

Own and maintain the GitLab repository structure for all AI initiatives — following the initiative-based repo architecture under the dedicated AI workspace, with consistent branching conventions, commit standards, and merge request workflows across all solution engineering work.
Build and maintain CI/CD pipelines in GitLab for AI solution deployment — automating testing, validation, model promotion, and environment deployment for GenAI applications, inference APIs, Azure Functions, and Databricks Jobs.
Maintain GitLab as the single source of truth for all AI solution code, configuration, model artifacts, and deployment history — ensuring every production AI system has a complete, auditable engineering record.

Enterprise System Integration, API Engineering & Data Ingestion

Build and maintain integrations between AI systems and enterprise platforms including NetSuite ERP, Procore, Asana, Microsoft 365, and Workday — enabling AI-powered insights, automation triggers, and real-time data flows.
Design and engineer the API and connector layer that AI agents and applications use to read from and write to enterprise systems — owning authentication (OAuth 2.0, API keys, managed identities), authorization, error handling, retry logic, and rate limiting.
Build data ingestion scripts and connectors in Python for AI-specific data feeds — pulling targeted data from enterprise sources (NetSuite reports, Procore project data, SharePoint documents, Teams activity) into AI-ready formats for model consumption.
Design and implement Azure Function-based ingestion triggers — event-driven Python functions that capture real-time data updates from enterprise systems and route them to AI processing pipelines without requiring full batch pipeline infrastructure.

AI Solution Governance, Standards & Technical Enablement

Establish and enforce engineering standards for AI solution development, testing, deployment, monitoring, and deprecation — covering Python code quality, GitLab workflow discipline, Databricks notebook standards, prompt versioning, and API documentation.
Own technical solution monitoring: system uptime, API latency, error rates, token consumption, Databricks job health, and infrastructure reliability for all deployed AI solutions. Coordinate with the AI Data Scientist on model performance signals that indicate architectural review is needed.
Ensure all AI solutions comply with security, privacy, data governance, and regulatory requirements in collaboration with IT, legal, and cybersecurity teams.

MUST-HAVE QUALIFICATIONS

Bachelor’s degree in Computer Science, Software Engineering, or equivalent practical experience.
7+ years of experience in AI/ML engineering, solution architecture, or enterprise software engineering — with at least 3 years working with production AI or ML systems.
Advanced RAG expertise: chunking strategies, embedding model selection, hybrid retrieval, reranking, multi-hop retrieval, query decomposition, and context window optimization.
Demonstrated experience implementing LLM smart routing: multi-model arbitration, intent-based routing, fallback chains, and cost-optimized model selection logic.
Hands-on agentic AI experience: orchestration frameworks (LangChain, LlamaIndex, AutoGen, or equivalent), tool registry design, memory architecture, and multi-agent coordination patterns.
Databricks ML Runtime proficiency as a practitioner: building and running Python notebooks, Jobs, and Workflows for AI workloads; MLflow for experiment tracking, model registry, and deployment.
GitLab proficiency: repository management, CI/CD pipeline construction, branching strategy, and code review workflows for AI solution codebases.
Data ingestion experience: Python-based connectors, Azure Function triggers, and API-driven data feeds from enterprise systems (NetSuite, Procore, SharePoint, or equivalent).
Strong understanding of vector databases and semantic search: Azure AI Search, Pinecone, Weaviate, Qdrant, Azure OpenAI, Azure ML, Copilot Studio or equivalent.
Deep technical fluency with REST APIs, OAuth 2.0, webhooks, and enterprise integration patterns across ERP, ITSM, project management, and HR platforms.
Strong communication skills — able to translate AI engineering decisions into clear language for executive and business audiences.

THIS MIGHT BE RIGHT FOR YOU IF

You have built RAG pipelines that went to production and you have the opinions and scars to prove it — you know when to use hybrid retrieval, when reranking matters, and what actually breaks at scale.
You have implemented smart routing in production — not just switching between GPT-3.5 and GPT-4 on cost, but designing intent classifiers, confidence thresholds, and fallback chains that make routing decisions the model cannot make for itself.
You have built agentic systems where agents are making real decisions in real enterprise workflows — not just demos, but production systems with tool libraries, memory, escalation logic, and audit trails.
You are fluent in Databricks end to end especially as an ML practitioner — you build notebooks, run Jobs, track experiments in MLflow, and deploy models to serving endpoints without needing the platform team to hold your hand.

WHY STACK?

Competitive compensation package with strong benefits including medical, dental, vision, 401K, flexible spending accounts, and cell phone subsidy.
A culture of appreciation with peer-to-peer recognition programs.
Fun is part of our DNA — events, game nights, happy hours, and barbecues.
We’re growing — this is a great time to join and make an impact!

You are a strong communicator, you are persuasive and clear, blending analytics with experience in decision-making.
You do not get flustered easily. You can juggle multiple priorities while balancing urgent requests with shifting timelines and deliverables.
You are a team builder. You take the time to understand and develop the strengths of your resources while formulating long-term plans for the growth and success of the team.
You are naturally curious and driven toward continual improvement. While you celebrate your successes, you take time to review and analyze campaigns for future learning.

WHY STACK?

We offer a competitive compensation package with strong benefits, including medical, dental, and vision insurance, a 401K program, flexible spending accounts – even a cell phone subsidy.
We foster a culture of appreciation, including peer-to-peer recognition and rewards programs.
Fun is part of our DNA, with events, game nights, happy hours, and barbecues.
We’re growing – this is a great time to join and make an impact!

STACK is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity and expression, age, national origin, mental or physical disability, genetic information, veteran status, or any other status protected by federal, state, or local law
Note to external agencies: We are not accepting any blind submissions or resumes/cvs from recruitment agencies. Any candidates sent to STACK Infrastructure, Inc. will not be accepted or considered as a submission without a signed agreement in place. Fees will not be paid in the event a candidate submitted by a recruiter without an agreement in place is hired; such resumes will be deemed the sole property of STACK Infrastructure, Inc.

AI Solution Architect

Benefits