We are seeking a skilled and proactive System Engineer (Day 2 Operations) to manage, support, and maintain a complex suite of IT systems and hardware infrastructure.
The role requires strong troubleshooting, system administration, and preventive maintenance skills to ensure the stability and performance of mission-critical systems.
You will work closely with the Maintenance Team Leader, Infrastructure Specialists, and System Owners to handle incidents, perform maintenance, support upgrades, and sustain high system availability.
Key Responsibilities
1. Incident Management & Troubleshooting
• Provide onsite technical support for all hardware, system, and application issues within maintenance scope.
• Assist in incident verification, isolation, and resolution, or provide approved workarounds.
• Troubleshoot and resolve Level 1 and Level 2 technical issues.
• Escalate complex or unresolved incidents to the Maintenance Team Leader and update the Maintenance Manager.
• Respond promptly to service disruptions, system alarms, and performance anomalies.
2. Preventive & Corrective Maintenance
• Perform daily system health checks and review logs for early issue detection.
• Execute preventive maintenance activities and carry out corrective actions as required.
• Perform and verify scheduled backups (daily incremental, hot backups, weekly full backups).
• Execute system recovery procedures during service restoration.
3. System Administration
• Manage user accounts (add/remove/update) in coordination with system owners.
• Reset passwords and manage access controls securely.
• Monitor and tune system or database performance based on advisories or logs.
4. Patch Management & Upgrades
• Test and deploy OS patches, firmware upgrades, and software updates.
• Stage and implement hardware upgrades and COTS software patches.
• Ensure compliance with change control and system hardening policies.
5. Hardware & Infrastructure Maintenance
• Support, troubleshoot, and maintain enterprise hardware including:
Servers: Dell PowerEdge R750
Firewalls: FortiGate 1101E
Storage Devices: Dell EMC XT380/XT480
Switches: Cisco C9300
UPS & Power Management: APC Smart UPS, Rack PDU
Others: KVM consoles, HSMs, NTP servers, mobile computing devices
6. Software Platform Support
• Manage and monitor various platforms and applications:
Core Platforms: ArcGIS Server, IBM ACE + MQ, Kafka, MongoDB, MS SQL, WebSphere, Elastic Stack, Rocket.Chat
Security & Endpoint Tools: Symantec, Carbon Black EDR, CipherTrust, Fortify WebInspect, Keycloak
Monitoring & DevOps Tools: Grafana, Prometheus, GitLab Enterprise, Ansible, OpenShift, Red Hat Satellite
7. Documentation & Reporting
• Maintain and update documentation (SOPs, maintenance records, system diagrams, logs).
• Generate reports on system performance, incident handling, and preventive maintenance activities.
8. Advisory & Continuous Improvement
• Provide technical advice on infrastructure improvement and system performance tuning.
• Propose and implement automation to enhance monitoring and recovery processes.
Requirements
Essential Qualifications & Experience
• Diploma or Degree in Computer Science, Information Systems, or related field.
• Minimum 3 years’ experience in IT system administration or infrastructure maintenance.
• Strong knowledge of Linux (RHEL) and Windows Server (2019) environments.
• Experience with backup systems (Dell EMC Data Domain, Avamar), firewalls, and enterprise-grade hardware.
• Familiarity with container platforms (OpenShift), middleware (IBM ACE, WebSphere), databases (SQL, MongoDB), and cloud/hybrid integration platforms.
Desirable Skills
• Working knowledge of DevOps tools (Ansible, GitLab, SonarQube).
• Familiarity with security technologies (Keycloak, CipherTrust, FortiGate, WebInspect).
• Knowledge of ITIL processes for incident, change, and problem management.
Soft Skills
• Strong analytical and troubleshooting abilities.
• Good written and verbal communication skills.
• Able to work independently and collaboratively in a team.
• Willing to work on-site, perform shift duties, and be on-call when required.