Responsibilities Operate and manage Kubernetes or OpenShift clusters for multinode orchestration. Deploy and manage LLMs and other AI models for inference using Triton Inference Server or custom endpoints. Automate CI/CD pipelines for model packaging, serving, retraining, and rollback using GitLab CI or ArgoCD. Set up model and infrastructure monitoring systems (Prometheus, Grafana, NVIDIA DCGM). Implement model drift detection, performance alerting, and inference logging. Manage model checkpoints, reproducibility controls, and rollback strategies. Track deployed model versions using MLFlow or equivalent registry tools. Implement secure access controls for model endpoints and data artifacts. Collaborate with AI/Data Engineers to integrate and deploy fine-tuned datasets. Ensure high availability, performance, and observability of all AI services in production. Requirements 3 years experience in DevOps, MLOps, or AI/ML infrastructure roles. 10 years overall experience with solution operations. Proven experience with Kubernetes or OpenShift in production environments, preferably certified. Familiarity with deploying and scaling PyTorch or TensorFlow models for inference. Experience with CI/CD automation tools with OpenShift/Kubernetes. Hands-on experience with model registry systems (e.g., MLFlow, KubeFlow). Experience with monitoring tools (e.g., Prometheus, Grafana) and GPU workload optimization. Strong scripting skills (Python, Bash) and Linux system administration knowledge. Key Skills ASP.NET, Health Education, Fashion Designing, Fiber, Investigation Employment Details Employment type: Full time Vacancy: 1

AI/MLOps Engineer With DevOps

Global Corporation