Site Reliability Engineer

Remote working until restrictions ease - You will be office based at any client hub thereafter

Job Number

44268

Posted

23rd Sep 2021 : 5:02 pm

Job Status

Live

Job Type

Temporary

Duration

6 Months

Pay Type

Inside IR35

Pay Rate

Up to £700.00

Payment Method

Daily

Contact

Surita Dadral

Contact details

0203 356 4949, admin@121.uk.com

Job Description

The public sector client is looking to recruit a Site Reliability Engineer to work on a 6 month temporary contract, which, enables you to work remotely until the restrictions ease, however, thereafter, you will be required to work at any of the clients office sites listed: Peel Park; Manchester; Leeds QH; Kings Court, Sheffield; Newcastle BPV. Due to the nature of the assignment the client is looking for candidates with a valid SC Clearance.

About you and your job purpose:

Site Reliability Engineering (SRE) is an enabling role for WCS, bringing a focus on application reliability and operational efficiency that will help DWP achieve its goals to ‘Provide Constant Availability’, ‘Deliver Service Excellence’, ‘Drive Down Operational Costs and Risks’, ‘Provide Actionable Insight and Visibility’, and ‘Streamline Change Across the Estate’. SRE combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. They work with the application, infrastructure and database developers to increase reliability and reduce down time and uptime targets. SREs ensure that DWP’s services (both internally critical and externally-visible systems) have reliability and uptime that meet business and citizen’s needs. They are also responsible for continuous improvement, while maintaining application capacity and performance. They devise and implement optimisations and automations that reduce technical debt; reduce operational costs and risks; automate toil; and improve the quality of the application code. They are involved from the initial application idea all the way through the application lifecycle to decommissioning.

 

Essential Skills & Experience

Technical Skills required: 
1. Technical Breadth – Has a deep understanding of the technical concepts required in their role and understands how these fit into the wider technical landscape. Understands the limitations of digital technology
2. Release & Deployment – Leads the assessment, analysis, planning and design of release packages, including assessment of risk. Liaises with business and IT partners on release scheduling and communication of progress. Conducts post release reviews. Ensures release processes and procedures are applied and that releases can be rolled back as needed. Identifies, evaluates and manages the adoption of appropriate release and deployment tools, techniques and processes (including automation)
3. Programming & Build – Collaborates with others when necessary to review specifications and uses these agreed standards and tools to design, code, test, correct and document programmes or scripts of medium to high complexity, using the right standards and tools.

Knowledge and Experience Required: 
e.g. Previous experience of working for a government dept.
• Knowledge/Experience of GitLab SaaS
• Knowledge//Experience of GitLab Runners (ideally AWS & Azure)
• Knowledge of Docker Management
• Passionate about improving internal processes
• Understanding of security engineering and security best practice
• Ability to architect and administer scalable, cloud-native and on premise applications
• Strong time management, and change management skills.
• Strong communications skills across multiple stakeholder types
• Strong skills in setting, communicating, implementing, and achieving business objectives and goals through direct management
• Skilled knowledge and ability in modifying and maintaining systems and code developed by other engineers
• The ability to lead engineers in a complex, multi-disciplinary environment, delivering products within specific timescales.

Key Tasks & Deliverables

Summary of the Role and key responsibilities: 
As a Senior Site Reliability Engineer you will drive adoption of SRE best practice across the team within which you are embedded. You will coach and mentor application development and operations engineers in the practice and techniques of SRE. You will assure the development, testing, and operation of the business or citizen-facing applications for which you are responsible and you will be accountable for the reliability of those applications. You will actively manage the work backlog and develop reliability improvements as well as leading initiatives to develop the automation of low-value tasks balanced against project delivery demands. You will provide technical leadership to wider operational teams along with providing oversight to the products and services they support.

Key Tasks & Deliverables:
Responsible for contributing authoritative advice and guidance to others in the organisation and externally.
• Design and develop the techniques for improving application reliability, run books, knowledge transfer to the UXCC, and ongoing SRE strategy within your Functional and Professional Communities
• Manage the error budget agreed with the product owner for the application and ensure that work is balanced in alignment with it
• Act as the focal point for the investigation and resolution of major or complex incidents for the service, ensuring people with the right skills and expertise are proactively available to respond effectively
• Assess the impact of change requests in consultation with stakeholders, providing technical expertise and authorising the implementation of subsequent changes
• Manage on-call rotations such that all applications have out-of-hours SRE coverage.
• Undertake comprehensive analysis of performance trends to identify root cause analysis, progressing opportunities to improve reliability, security, capability of infrastructure, application and site services
• Actively engage with senior stakeholders and provide clear communication of incident resolution and service improvements.
• Assure critical changes to the applications and supporting infrastructure
• Develop and maintain relevant knowledge such that it can be easily annotated, updated, referenced, and consumed
• Conduct code assessments, with a view to correcting errors and providing recommendations for reliability improvements
• Manage the team backlog for the applications for which you are accountable
• Coach and mentor application development and operations engineers in the practice and techniques of SRE
• Conduct reflectives for all high priority and major incidents ensuring they are done quickly and published
• Routinely seek views and capture ideas from stakeholders and team members for improvements and encourage collaboration and innovation
• Interdepartmental discussions and meetings with a wide variety of external bodies and organisations on a local, regional, national or international basis, leading community discussions about SRE best practice within Engineering.

Qualifications, Training & Certificates

About the Rates of Pay: Please note: The higher pay rate advertised in our job advert/s will always be the highest Ltd or Umbrella Company pay rate that the client is willing to pay up to and the lower pay rate advertised in our job advert/s will always be the highest PAYE pay rate, that the client is willing to pay up to, unless otherwise specified. If the PAYE rate is not indicated in the job advert then please contact us for confirmation of the PAYE daily pay rate.

Clearance

SC - Security Clearance

Apply for the job

Thank you for expressing an interest and applying for this job. When applying for our job/s, please do not send or add any financial details on your CV.

Upload CV

Please wait...