Job Title:
Machine Learning Engineer/SRE
Location:
Chicago, IL or 100% Remote
Duration:
12 Months
Key Responsibilities:
• Azure Infrastructure Management: Configure, maintain, and optimize Azure infrastructure for AI model development and deployment, ensuring scalability and performance.
• Model Performance Monitoring: Implement and maintain monitoring systems to track model performance, proactively identifying and addressing issues as they arise.
• Incident Response: Collaborate with the SRE team to respond promptly to outages and incidents related to model operations, ensuring minimal downtime and rapid issue resolution.
Required Skills and Qualifications:
• Azure Infrastructure Experience: Proficiency in managing Azure infrastructure components, including virtual machines, storage, and networking, to support AI model development and deployment.
• CI/CD Pipeline Experience: Experience with Continuous Integration/Continuous Deployment (CI/CD) pipelines, including the automation of model deployment processes.
• Containerization in the Cloud: Strong knowledge of containerization technologies in the cloud, such as Docker and Kubernetes, for efficient deployment and scaling of machine learning models.
• MACHINE LEARNING EXPERTISE: Proficient in building and optimizing machine learning models, with a deep understanding of various algorithms and frameworks.
• Programming Skills: Proficiency in programming languages commonly used in machine learning, such as Python and libraries like TensorFlow and PyTorch.
• Data Management: Experience in data preprocessing, feature engineering, and data pipeline development for machine learning.
• Collaborative Team Player: Excellent communication skills and the ability to work collaboratively with cross-functional teams, including AI engineers and SREs.
• Documentation: Effective documentation skills to maintain clear and organized records of models, infrastructure configurations, and incident responses.
Preferred Qualifications:
• Experience with cloud-based machine learning platforms: Familiarity with cloud-based machine learning platforms, such as Azure Machine Learning.
• Experience with CI/CD tools: Experience with CI/CD tools to deploying Client services and applications specific to Azure cloud platform.
• Familiarity with DevOps practices and tools: Familiarity with DevOps practices and tools for automating infrastructure and deployments.
• Knowledge of model versioning and model management tools: Knowledge of model versioning and model management tools.
• Understanding of security best practices in AI model deployment: Understanding of security best practices in AI model deployment.
• Certifications in relevant areas: Certifications in relevant areas, such as Azure certifications or machine learning certifications.