Sr. Manager: SRE – Remote work

  • Direct Placement
  • Anywhere
The ideal candidate for this role will understand the rigors of working in a high-paced, deeply technical environment. Take ownership and responsibility for all team activities. Communicate and collaborate with various stakeholders and individuals throughout the organization. Work with a sense of urgency to drive and complete projects and team objectives. You must be passionate about individual contributions, career development, and progression, providing guidance and mentoring.

Job Responsibilities:

  • Act as primary point-of-contact for all infrastructure projects and requests
  • Assume lead role in troubleshooting, service restoration, and root cause analysis of incidents and outages
  • Provide project management, planning, and road-mapping support
  • Be the driving force behind our automation, monitoring, and observability initiatives
  • Build and maintain operational tools for deployment, monitoring, and analysis of the infrastructure and systems
  • Work collaboratively with software engineering to define infrastructure and deployment requirements; be a sounding board and provide recommendations for engineering
  • Establish, document, publish, and communicate ISRE standards, processes, and procedures
  • Plan, strategize, and assign team goals and objectives
  • Provide professional mentorship and career development for team members
  • Seek opportunities for continuous improvements in our tools, technologies and processes
  • All other duties and responsibilities as assigned
  • Participate in a 24x7x365 on-call rotation

Skills & Competencies

  • Proven track record working in large-scale environments
  • Expert-level administration and operational support for various Linux operating systems
  • Deep knowledge of server and system hardware
  • Experience working with Linux systems from kernel to shell, including working with system libraries, file systems, and client-server protocols
  • Experience with networking (TCP/IP, UDP, ICMP, ARP, DNS, load balancing, etc.)
  • Experience with configuration management tools (Ansible)
  • Working knowledge of content management systems, source control systems, GIT, Jira, Confluence, and ServiceNow
  • Must have excellent interpersonal skills; solid communication skills, both written and verbal
  • Must be organized, detail-oriented, and able to manage multiple tasks simultaneously with the ability to prioritize appropriately

Education & Experience
A Bachelor's degree in Computer Science, a related SRE technical field, or relevant equivalent industry experience
Minimum of 8 years of industry experience in engineering with 4+ years of leadership experience
5+ years of experience with major Incident Management, Program Management or related Incident Command processes
Experience in managing, collaborating, and influencing global teams

To apply for this job please visit