AI / MLOps Engineer
22 Months Contract
HYBRID ROLE > 2-4 DAYS/WEEK ONSITE IN ORLANDO FL, GLENDALE CA, ANAHEIM CA OR SEATTLE WA
Description\
Seeking an AI/ML Operations professional for the following role -
Overall Responsibilities
• Manage operational workflows for model deployments, updates, and versioning across GCP, Azure, and AWS.
• Monitor model performance metrics: latency, throughput, error rates, token usage, and inference quality
• Track model drift, accuracy degradation, and performance anomalies - escalating to engineering as needed.
• Support knowledge base operations including vector embedding pipeline health, chunk quality, and refresh cycles in Vertex AI.
• Maintain model inventory and documentation across multi-cloud environments.
• Coordinate model evaluation cycles with Responsible AI and Core Engineering teams
Agent & MCP Server Operations
• Monitor AI agent health, performance, and reliability (AutoGen-based agents, MCP servers)
• Track agent execution metrics: task completion rates, tool call success/failure, latency, and error patterns
• Support agent deployment and configuration management workflows
• Document agent behaviors, known issues, and operational runbooks
• Coordinate with Core Engineering on agent updates, testing, and rollouts
• Monitor MCP server availability, connection health, and integration status
FinOps & Cost Management
• Track and analyze AI/ML cloud spend across GCP (Vertex AI), Azure (OpenAI), and AWS (Bedrock)
• Build cost dashboards with breakdowns by model, application team, use case, and environment.
• Monitor token consumption, inference costs, and embedding/storage costs.
• Identify cost optimization opportunities - model selection, caching, batching, rightsizing.
• Provide cost allocation reporting for chargeback/showback to consuming application teams.
• Forecast spend trends and flag budget anomalies.
• Partner with Infrastructure and Finance teams on AI cost governance.
Monitoring, Dashboarding & Reporting
• Build and maintain dashboards for platform performance, model health, agent metrics, and operational KPIs.
• Create executive and stakeholder reports on platform adoption, usage trends, and cost allocation.
• Develop Responsible AI dashboards tracking hallucination rates, accuracy metrics, guardrail triggers, and safety incidents.
• Monitor APIGEE gateway traffic patterns and API consumption trends.
• Provide regular reporting to product management on use case performance.
Release Operations Support
• Support release management processes with pre/post-deployment validation checks.
• Track release health metrics for models, agents, and platform components.
• Maintain release documentation, runbooks, and operational playbooks.
• Coordinate with QA, Performance Engineering, and Infrastructure teams during releases.
AI Operations
• Monitor guardrail effectiveness and flag anomalies to the Responsible AI team.
• Track and report on hallucination detection, content safety triggers, and accuracy trends.
• Support LLM Red Teaming efforts by collecting and organizing evaluation data.
• Maintain audit logs and compliance documentation for AI governance.
Cross-Functional Coordination
• Serve as operational point of contact for application teams consuming DxT AI APIs.
• Coordinate with Corporate Security on audit requests and compliance reporting.
• Partner with Infrastructure team on capacity tracking and resource utilization.
• Support Performance Engineering with load test analysis and results documentation.
Basic Qualifications
• 2-4 years in an Ops, Analytics, or Technical Operations role (MLOps, AIOps, DataOps, Platform Ops, or similar)
• Understanding of AI/ML concepts: models, inference, embeddings, vector databases, LLMs, tokens, prompts.
• Experience with cloud cost management and FinOps - tracking, analyzing, and optimizing cloud spend.
• Strong proficiency with dashboarding and visualization tools (Looker, Tableau, Grafana, or similar)
• Working knowledge of GCP (required); familiarity with Azure and AWS a plus.
• Comfortable with SQL and basic Python for data analysis and scripting.
• Experience with monitoring and observability platforms (Datadog, Prometheus/Grafana, Cloud Monitoring, or similar)
• Understanding of APIs and API gateways - ability to read logs, trace requests, analyze traffic.
• Strong analytical and problem-solving skills with attention to detail.
• Excellent communication skills - able to translate technical metrics into stakeholder insights.
• College degree in Computer Science, BIS, MIS, EE, ME or similar is required.
Preferred Qualifications
• Hands-on experience with LLM platforms: Vertex AI, Azure OpenAI, AWS Bedrock
• Familiarity with AI agents and agentic architectures (AutoGen, LangChain, or similar)
• Exposure to MCP (Model Context Protocol) or agent-tool integration patterns.
• Experience with vector databases and RAG (Retrieval-Augmented Generation) operations.
• Understanding of MLOps lifecycle: model registry, versioning, deployment patterns, A/B testing
• Experience with APIGEE or similar API management platforms.
• Familiarity with Responsible AI metrics - hallucination, bias, content safety, guardrails.
• FinOps certification or formal cloud cost management experience.
• Experience supporting enterprise platform teams with multiple consuming applications.
• Familiarity with ML pipeline tools (Kubeflow, MLflow, Vertex AI Pipelines)
• Exposure to prompt management and evaluation frameworks.
• ITIL or operational process framework experience.
• Experience creating runbooks and operational documentation.
Education
BE/BS in Computer Science, Business Information Systems, Management Information Systems, Electrical Engineering, Mechanical Engineering or similar
The estimated pay range for this position is USD $70.00/hr - USD $77.50/hr in Florida, and USD $85.00 - 91.00/hr in California/Washington. Exact compensation and offers of employment are dependent on job-related knowledge, skills, experience, licenses or certifications, and location. We also offer comprehensive benefits. The Talent Acquisition Partner can share more details about compensation or benefits for the role during the interview process.