We are seeking a Junior Production Support engineer to provide operational support for large-scale enterprise applications during standard U.S. business hours. This role is ideal for someone early in their career who is eager to learn, highly detail-oriented, and interested in building a strong foundation in production operations, incident management, and system reliability. The engineer will primarily focus on triaging and resolving production support tickets, investigating alerts, and performing system health checks across on-premise and cloud environments, with mentorship and guidance provided by senior engineers.
Key Responsibilities
· Serve as a first-line responder for incoming production support tickets, including:
o Customer-reported issues and complaints
o Support requests to identify and report if prior incidents or anomalies occurred during specific time periods
o Questions regarding system availability, performance, and data freshness
· Triage, investigate, and resolve issues where possible; escalate to senior engineers or specialized teams with clear diagnostics and impact assessment.
· Monitor enterprise systems and dashboards covering:
o Microservices and APIs (latency, error rates, availability)
o Batch jobs, scheduled workloads, and ETL/data pipelines (success/failure, duration, SLA adherence)
o Server and container health (CPU, memory, disk, network, capacity trends)
o Database health and performance (availability, replication, query latency, resource utilization)
o Application and infrastructure logging, including centralized log ingestion, indexing, and search.
· Respond to alerts and alarms, validate whether they represent real incidents, and follow runbooks to perform initial troubleshooting.
· Execute manual operational checks using defined checklists and document sign-offs to confirm system health.
· Communicate clearly and professionally with support teams, product and engineering teams, and infrastructure teams throughout the investigation and resolution process.
· Maintain accurate ticket updates, incident timelines, and shift handoff notes.
· Learn and contribute to improving runbooks, knowledge bases, and monitoring coverage over time.
Qualifications
· 1–3 years of experience in Production Support, NOC, Help Desk, Systems Operations, or a related technical support role.
· Basic understanding of enterprise applications and platforms (e.g., servers, web services and distributed applications, databases and batch/ETL workflows, cloud platforms)
· Experience working with ticketing systems (e.g., ServiceNow, Jira, Zendesk) and following incident management processes.
· Strong attention to detail and ability to follow runbooks, checklists, and escalation procedures accurately.
· Excellent written and verbal communication skills, with the ability to explain technical issues clearly to both technical and non-technical partners.
· Strong problem-solving mindset and curiosity to investigate issues end-to-end and learn from senior engineers.
· Familiarity with monitoring and observability tools such as Grafana, Prometheus, Splunk, Datadog, AppDynamics, or similar).
Compensation
$25/hr to $30/hr
Exact compensation may vary based on several factors, including skills, experience, and education.
Benefit packages for this role will start on the 31st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.