The ideal candidate for this role will understand the rigors of working in a high-paced, deeply technical environment. Take ownership and responsibility for all team activities. Communicate and collaborate with various stakeholders and individuals throughout the organization. Work with a sense of urgency to drive and complete projects and team objectives. You must be passionate about individual contributions, career development, and progression, providing guidance and mentoring.
Job Responsibilities:
- Act as primary point-of-contact for all infrastructure projects and requests
- Assume lead role in troubleshooting, service restoration, and root cause analysis of incidents and outages
- Provide project management, planning, and road-mapping support
- Be the driving force behind our automation, monitoring, and observability initiatives
- Build and maintain operational tools for deployment, monitoring, and analysis of the infrastructure and systems
- Work collaboratively with software engineering to define infrastructure and deployment requirements; be a sounding board and provide recommendations for engineering
- Establish, document, publish, and communicate ISRE standards, processes, and procedures
- Plan, strategize, and assign team goals and objectives
- Provide professional mentorship and career development for team members
- Seek opportunities for continuous improvements in our tools, technologies and processes
- All other duties and responsibilities as assigned
- Participate in a 24x7x365 on-call rotation
Skills & Competencies
- Proven track record working in large-scale environments
- Expert-level administration and operational support for various Linux operating systems
- Deep knowledge of server and system hardware
- Experience working with Linux systems from kernel to shell, including working with system libraries, file systems, and client-server protocols
- Experience with networking (TCP/IP, UDP, ICMP, ARP, DNS, load balancing, etc.)
- Experience with configuration management tools (Ansible)
- Working knowledge of content management systems, source control systems, GIT, Jira, Confluence, and ServiceNow
- Must have excellent interpersonal skills; solid communication skills, both written and verbal
- Must be organized, detail-oriented, and able to manage multiple tasks simultaneously with the ability to prioritize appropriately
Education & Experience
A Bachelor's degree in Computer Science, a related SRE technical field, or relevant equivalent industry experience
Minimum of 8 years of industry experience in engineering with 4+ years of leadership experience
5+ years of experience with major Incident Management, Program Management or related Incident Command processes
Experience in managing, collaborating, and influencing global teams
To apply for this job please visit www2.jobdiva.com.