Site Reliability Engineer
Job Responsibilities
- Install, upgrade and manage systems powering customer infrastructure running Circonus software
- Troubleshoot availability and performance issues
- Diagnose production issues and perform front-line remediation
- Communicate with management and customers regarding aberrant system’s behavior
- Influence software and architecture design based on system and architecture observations related to performance and reliability
- Participate in an on-call schedule
Job Requirements
- Linux (RHEL, CentOS, Ubuntu)
- Experience working with cloud service providers such as AWS, Azure, or GCP
- Ansible, Chef or similar configuration system
- HAProxy, PostgreSQL, Apache or similar technologies
- Strong networking knowledge: firewalls, TCP & UDP, DNS, SSL/TLS
- Strong understanding of monitoring principles
- Familiarity leveraging REST and REST-like APIs for operations tasks
- UNIX troubleshooting skills: tcpdump, strace, bpftrace, etc
- Fluency in one or more of the Git, Subversion or Mercurial version control systems
Preferred Experience
- 7+ years’ experience in the technology industry
- Experience and/or senior technical knowledge of monitoring and analytics solutions
- Experience with Docker, Kubernetes and containers
- The right person will be highly technical and analytical much like the company itself