Overview
Ensure maximum possible service availability and performance
Provide comprehensive support to our Engineering and other Operational and technical teams
These objectives translate into a broad and dynamic scope of responsibilities for the team.
Engineers will have the capability to centrally manage OCI’s networks and implement automated solutions to address common operational challenges efficiently.
Technical Qualifications
Networking
• Expert knowledge in the following protocols: BGP/OSPF/IS-IS, TCP, IPv4, IPv6, DNS, DHCP, MPLS
• Comprehensive/Broad experience in at least 3 of the following network technologies: Juniper, Cisco, Arista, InfiniBand, firewalls, switches and circuit management
• Expert analytical skills and ability to collate and interpret data from various sources
• Ability to diagnose network alerts to assess and prioritize faults and respond or escalate accordingly
• Experience working in a large ISP or cloud provider environment
• Exposure to commodity Ethernet hardware (Broadcom/Mellanox)
• Cisco and Juniper certifications are desired
• GPU/RDMA
• Experience in GPU/RDMA network environments is highly desired
• Experience with High Performance Compute
• Experience with InfiniBand
Design
• Participate in Network lifecycle management through network build and/or upgrade projects
SOFT SKILLS & OTHER DESIRED EXPERIENCE
• Highly motivated and self-starter
• Bachelor’s degree is preferred with at least 3-10 years of network-related experience
• Excellent time management and organization skills
• Comfortable and able to deal with a wide range of issues in a fast-paced environment
• Excellent organizational, verbal, and written communication skills
• Experience with Incident Response plans and strategies
• Have worked with large enterprise network infrastructure and cloud computing environments, supporting 24/7 and willing to work in rotational shifts in a network operations role
Automations/Scripting
• The role includes collaborating with networking automation services to integrate support tooling and frequently developing scripts to automate routine tasks
• Preference for individuals with experience in scripting, network automation and databases - Python, Puppet, SQL, and/or Ansible
• You will use automation to complete work and develop scripts for routine tasks
Project Management
• Lead technical projects such as the development and improvement of runbooks and methods of procedures, driving high visibility technical projects and onboarding and training new team members
• Assists in the implementation of short, medium, and long-term plans to achieve project objectives, and regularly interacts with senior management or network leadership to ensure team objectives are met
Leadership
• Collaborates with shift Leads and management to ensure the efficient and timely completion of daily responsibilities
• Takes the initiative to lead, contribute to and participate in the identification, development and evaluation of projects and tools aimed at enhancing the overall effectiveness of the Restricted Regions Teams
• Drives runbook audits and updates to ensure compliance and collaborate with partner service teams to ensure operational processes are aligned between both teams
• Conduct interviews and participate in the hiring process for junior level engineers
• Lead and/or represent the Restricted Regions in vendor meetings, reviews or governance boards
Responsibilities
• Networking Operations
• Using existing procedures and tooling, develop and safely complete network changes
• Mentor, onboard and train junior engineers
• Participate in operational rotations providing break-fix support
• Identifying actionable incidents using monitoring systems, strong analytical problem-solving skills to mitigate network events/incidents, and following up on routine root cause analysis (RCA), coordinating with support teams and vendors
• Provide on-call support services as needed, job duties are varied and complex, needing independent judgment
• Join major event/incident calls, use technical and analytical skills to resolve network issues that impact Oracle customers/services
• Fault handling and escalation - Identifying and responding to faults on OCI’s systems and networks, collaborating closely with 3rd party suppliers, handling escalation through to resolution