JOB DESCRIPTION
Customer is building an edge (80%) and cloud (20%) application for safe automation and
optimization of Well Construction processes. Read about the product - DrillOps at our website.
There are teams located in US, China, France, Georgia.
SRE mission
Building the foundation for modern ops. By using available monitoring system, the SRE will
analyze design and propose way to improve the environment monitoring including the right
and wrong things to monitor and why. The problem SRE will need to solve for our team make
available in Cloud current state of each edge deployment (system health, SLI, performance). SRE
should be able to identify product issues as they arise in production/test environments and
create automated (as much as possible) solutions for fixing the issues to keep incident
management sustainable.
Responsibilities
- In charge of maintaining/improving product monitoring system
- Incident response management (troubleshooting, resolution, documentation, post-mortem analysis)
- Knowledge sharing on the lessons learnt
- Be a bridge between operations and development?
?Key Requirement Engineers with existing SRE experience - most of SREs have cloud products
background, and our focus is Edge.
Experience required
- Building solutions from scratch
- Writing code to automate processes (log analysis, testing production environments, alerts
automation)
- Expertise in cloud providers
Tools
Incident management/on-call: PagerDuty
Logging: ELK/Kibana, SEQ logging
Language: Python, C#, scripting.
Database: SQL,Mongo
Network: Basic network knowledge (inbound/outbound and fw rules)
Monitoring: Prometheus, Grafana
Project management and issue tracking: AzureDevOps, Wiki
Source code management: Git
Infrastructure and orchestration: SaltStack, Docker, Zededa? No