Senior Site Reliability Engineer

Senior Site Reliability Engineer

Wikimedia Foundation

June 7, 2026July 22, 2026Home Based Online
Job Description
Job Posting Organization:
The Wikimedia Foundation is a nonprofit organization that operates Wikipedia and other Wikimedia free knowledge projects. Established to promote free access to knowledge, the foundation envisions a world where every individual can freely share in the sum of all knowledge. It is a charitable organization that relies on donations from millions of individuals globally, with an average donation of about $1
  • The foundation is a 501(c)(3) tax-exempt organization based in the United States, with its headquarters located in San Francisco, California. The organization employs a diverse workforce and operates in over 40 countries, continuously striving to maintain an inclusive and equitable workplace.

Job Overview:
The Senior Site Reliability Engineer (SRE) at the Wikimedia Foundation plays a crucial role in supporting and developing the infrastructure that serves Wikipedia, one of the most visited websites globally. The SRE team is responsible for ensuring the health and reliability of the platform, which is essential for delivering knowledge to millions of users. This position involves performing operational and DevOps tasks, implementing configuration management tools, and leading continuous improvement initiatives. The role requires collaboration with product teams to design scalable services and participation in a 24/7 on-call rotation for incident response. The SRE team values exploration and experimentation with new technologies, and the position is remote-first, allowing for flexibility in work arrangements while requiring occasional travel for team meetings and events.

Duties and Responsibilities:
The duties and responsibilities of the Senior Site Reliability Engineer include performing day-to-day operational tasks on Wikimedia’s public-facing infrastructure, which encompasses deployment, maintenance, configuration, and troubleshooting. The engineer will implement and utilize configuration management and deployment tools such as Puppet and Kubernetes. They will lead efforts in automating the installation, configuration, and maintenance of services on the platform. Additionally, the engineer will work closely with product teams to assist in the architectural design of new services, ensuring they operate efficiently at scale. Participation in a 24/7 on-call rotation is required, which includes incident response and follow-up on system outages or alerts. The engineer will also collaborate with a global, cross-functional team and mentor peers in their areas of expertise.

Required Qualifications:
Candidates for the Senior Site Reliability Engineer position should have at least 6 years of experience in an SRE, Operations, or DevOps role as part of a team. Proficiency in shell scripting and any scripting language relevant to SRE contexts, such as Python, Go, Bash, or Ruby, is essential, with a primary focus on Python. Experience with configuration management tools like Puppet and Ansible is required. Candidates should also have experience with distributed caching systems, Linux package management, and strong Linux system-level troubleshooting skills. A history of automating tasks and processes, along with experience in incident response and post-incident reviews, is highly valued. Strong English language skills, both verbal and written, are necessary for effective communication within a globally distributed team.

Educational Background:
While the job posting does not specify a particular educational background, candidates are typically expected to have a degree in a relevant field such as Computer Science, Information Technology, or a related discipline. Practical experience and demonstrated skills in site reliability engineering and operations may also be considered in lieu of formal education.

Experience:
The position requires a minimum of 6 years of experience in an SRE, Operations, or DevOps role. Candidates should have a proven track record of working as part of a team in a similar capacity, demonstrating their ability to handle operational tasks and contribute to the reliability of large-scale systems. Experience in leading incident response efforts and conducting root cause analysis is also essential.

Languages:
Strong proficiency in English is mandatory, both verbal and written, to facilitate effective communication within the team and with stakeholders. Additional language skills may be beneficial but are not explicitly required for this position.

Additional Notes:
The Wikimedia Foundation is a remote-first organization, allowing employees to work from various locations. The anticipated annual pay range for this position for applicants based in the United States is between $113,082 and $175,725, depending on factors such as location and experience. For applicants located outside the U.S., the pay range will be adjusted according to the country of hire. The organization is committed to equitable compensation practices and does not consider salary history in its hiring process. The foundation encourages applicants from diverse backgrounds to apply and provides accommodations for individuals with disabilities during the application process.
Apply now
Similar Jobs