The required skills we’re looking for are:

· Strong AWS services experience

· Cloud Watch, Cloud Trail, Lambda, Elastic Load Balancer, Auto Scaling, S3, Route53, VPC

· Cloud Formation – provision templates for alarms

· Splunk

· Create and manipulate dashboards / metrics

· Disaster Recovery / Failover Scenarios

· Understand how to make an environment resilient and highly available

· Influencing:

· Candidate should be able to convince internal customers (AppDev teams) benefits of shifting left, leveraging SRE model for performance, availability, resiliency, etc.

· Ability to clearly explain these things

· Confidence and communication are must have skills

· Familiar with KPI’s:

· MTTR (Mean Time to Resolve) and MTTD (Mean Time to Detect)

· Highly desired skills are:

· Disaster Recovery / Failover Experience – Load Balancing, Resiliency, High Availability

Description: Site Reliability Engineer (SRE) roles and responsibilities

The SRE role bridges the Development Engineer role and the Production Engineer role with a mixture of development, test, deploy, and support skills that contribute to application reliability and resiliency. The SRE approaches problems as an Engineer and looks to automate processes with code or tools to detect and prevent identified software reliability issues. The SRE role splits time between runtime support issues (toil) and development automation work (dev). The skills are organized by Development, Support, and Common areas.

Software Development and Configuration

The following SRE skills are used to improve reliability of an application/service while it is in development:

Required Skills:

• 10+ years overall experience

• Hands-On in at least one language – Java (must), Python (3-4 yrs)

• Hands-On experience with automated testing tools (JMeter, Junit, Mockito, Postman)

• Hands-On experience with a source code management system like GIT or SVN including pull, push, branch, commit and merge functions

• Hands-On experience creating, configuring and maintaining cloud-based applications and infrastructure for the rapid development and monitoring of applications and services:

AWS, EC2, Fargate, CloudFormation, RDS, ElasticCache, S3

• Experience with Cloud Migrations with reliability and availability as core focus

• Experience in implementing the SRE at the team/enterprise level with hands-on implementation of SRE practices and improving the metrics

• Hands-On experience with monitoring tools (Splunk, Dynatrace) and dashboard development including development and customization of dashboards

• Hands-On experience with the build, deploy, and packaging process and best practices. Familiar using DevOps automation tools (UCD, Jenkins, Maven, SonarQube, Chef, Ansible, Puppet)

• Scripting skills for automation (Linux bash and Windows)

General Required Skills:

• Ability to diagnose and optimize software code for reliability and resiliency

• Knowledge of the incident management process and reporting tools (ServiceNow, Jira Service Desk)

• Good communication and documentation skills. An SRE must document their work, collect and document “tribal knowledge” (the good stuff in people’s head), and make it accessible to others.

• Experience triaging incidents and conducting RCAs (Root Cause Analysis)

Nice to have skills:

• Ability to diagnose technical problems, isolate and debug issues, formulate creative solutions, analyze alternative approaches, and implement a timely solution.

• Experience providing alternatives and estimates for implementing a fix or automation to improve reliability.

• Ability to juggle several different tasks at a time, and able to frequently adjust for new tasks or higher priority tasks.

• Experience with a modern RDBMS or NoSQL, like Postgres, MySQL, DB2, Oracle, MongoDB, and Cloudant

Other Required Skills:

· Setting up / Creating alerts using monitoring tools (Cloud Watch, Dynatrace, Splunk, etc.)

· Configuring CI/CD pipelines using Jenkins, Ansible, GitLab, etc.

· Deploy and provision pre – built Terraform or CloudFormation templates.

Job Category: Technology
Job Type: C2C
Job Location: USA

Apply for this position

Allowed Type(s): .pdf, .doc, .docx