SalemRecruiter Since 2001
the smart solution for Salem jobs

Senior Site Reliability Engineer (SRE) (2157)

Company: SMX Corporation
Location: Salem
Posted on: March 15, 2023

Job Description:

Senior Site Reliability Engineer (SRE) (2157)at SMX(View all jobs) (
United States
SMX is seeking a driven and talented Senior Site Reliability Engineer (SRE) to join our thriving Cloud Services business unit and work with some of the best technologists in the market. Senior Site Reliability Engineers provide senior-level implementation support services and subject-matter expertise to SMX clients on IT consulting engagements. This is a remote position supporting a Herndon, Virginia based team. Using knowledge and experience in technical architecture and systems integration, our Senior Site Reliability Engineers are responsible for assisting with the Technology team deliverables including building of dashboards for monitoring metrics on top tier apps, the continuous build/deployment of automation scripts, and maintaining system configurations across multiple environments hosted on the AWS cloud tech stack. In addition, our Senior Site Reliability Engineers work closely with the delivery teams and SMX clients to drive adoption of modern reliability practices like SLOs, error budget policies, actionable alerts, incident retrospectives, chaos testing, and end-to-end ownership, and to prioritize the timely completion and delivery of these tasks. This individual will bring a passion for technology, a strong technical skill set, and an ability to deploy, employ, operate, and sustain Production-ready solutions, software, and tools for our customers. Our Site Reliability Engineers have working knowledge of continuous integration models, work directly with leads and program managers and exhibit an overall willingness to contribute to the SMX team. This individual will bring experience in infrastructure and operations automation and will provide hands-on experience implementing cloud-native, and automation-centric solutions to drive operation efficiencies with a strong focus on quality, communication, customer success, and results.
Essential Duties and Responsibilities:

  • Implement application/infrastructure observability solutions and perform maintenance to ensure desired application availability

  • Real-time service management inclusive of building monitoring for the golden signal SLIs, establishing, negotiating SLOs with the business, building alerting, creating playbooks and runbooks for services in conjunction with development teams, product owners and support

  • Apply automation and software to any tasks or parts of the system that would benefit from it or are performed manually.

  • Handle Cloud Operations (Events, Incidents, and Requests) based on a defined, ticket-driven service catalog.

  • Provide guidance and leadership to the SRE team

  • Perform internal team technical reviews

  • Work with customer and SRE team to identify, develop, deploy, and maintain solutions

  • Be a primary "face to the customer" during the Manage phase of the customer lifecycle - communicating clearly and concisely to identify, triage, remediate, and resolve infrastructure and solution issues when customer needs are greatest.

  • Take direction from, and provide clear and timely updates to, Project Lead or Project Manager

  • Proactively identify potential operations and reliability issues and work to resolve

  • Identify system or performance issues, and develop resolutions using automation

  • Identify opportunities for automation and implement them to drive operational efficiency and cost reduction

  • Implement and maintain backup and disaster recovery solution for customers' cloud computing resources

  • Optimize existing - and identify new opportunities for - monitoring, logging, and management metrics to improve operational effectiveness and customer knowledge

  • Participate in troubleshooting of infrastructure and/or application related issues

  • Produce well-written technical project documentation and operational runbooks

  • Participate in change management processes

  • Maintain core working hours but remain flexible to support after-hours maintenance and escalations (as necessary)

  • Participate as a team player capable of high performance and flexibility in a dynamic working environment

  • Take ownership of issues and act with high sense of urgency when required

  • Improve CI/CD tools integration/operations and full automation of CI/testing

  • Identify and support Continuous Improvement opportunities to increase system reliability

  • Troubleshoot issues with CI/CD pipeline

  • Deploy and configure cloud services according to best practice (e.g.: Virtual Machines, Virtual Network, AWS AD, CDN, serverless functions, DNS, Monitor, Key Vault, Blob storage)

  • Achieve and maintain AWS certifications

    Required Skills:

    • 7+ years of experience in DevOps or SRE

    • Proven ability to dissect a technical architecture into engineering plans and discrete tasks

    • Excellent customer facing skills and the calm professional demeanor necessary to bolster customer confidence when stress is highest

    • Ability to work collaboratively with customers

    • Scripting Experience, Kusto Query Language, Arm Templates, PowerShell

    • Strong skillset with AWS Automation, DevOps Pipeline and related AWS tooling

    • Collaborate with internal dev team to help end-to-end testing

    • Solid command of standard CI/CD tools (Terraform, Ansible, Git, Jenkins, etc.)

    • Solid experience with container-based deployments using Docker, working with Docker images, Docker hub and Docker registries. Installation and configuring Kubernetes and clustering them.

    • Scripting Experience, Kusto Query Language, Arm Templates, PowerShell

    • Proficiency and proven hands-on experience with AWS IaaS and PaaS Services, AWS Active Directory, and SQL Server Infrastructure.

    • Experience with AWS Monitoring, Migrate, Log Analytics, AWS SSM, Load Balancer techniques

    • Ability to write scripts in JavaScript, Bash, Python, or similar

    • Experience in monitoring, metrics collection, and reporting using open-source tools

    • Depth of knowledge in security best practices, tools, and compliance frameworks (NIST, FedRamp, HIPAA, etc.)

    • Strong written and verbal communication skills

    • Degree in a technical discipline or additional 6 years' experience in lieu of degree

      Desired Skills / Certs:

      • BS/BA in Computer Science, Computer Engineering or related field or equivalent technical experience

      • Current operations experience within a Cloud Managed Services Provider (MSP) delivery environment.

      • One of more of the following certifications are required:

      • --- AWS Certified Developer - Associate (DVA-C01) --- AWS Certified SysOps Administrator - Associate (SOA-C02)--- AWS Certified Solutions Architect - Associate (SAA-C03)--- AWS Certified DevOps Engineer - Professional (DOP-C01)--- AWS Certified Solutions Architect - Professional (SAP-C01)--- DevOps Institute: Site Reliability Engineering Foundation (SREF)

        Our tradition of delivering innovative, technical solutions dates back to 1995, however, you may know us better by one of our legacy company names: Trident Technologies, Smartronix, Datastrong or C2S Consulting Group. With the support of OceanSound Partners, our private equity investment sponsor, we began operating as one business starting in 2019 and became SMX in 2021. We operate in close proximity to our clients around the globe and have core locations in Alabama, California, DC Metro, Florida, Hawaii, Maryland, and Massachusetts.
        Today, as SMX, we are one team and together empower government and commercial enterprises to become more effective, innovative, and resilient, no matter what challenges they face.
        SMX is committed to hiring and retaining a diverse workforce. All qualified candidates will receive consideration for employment without regard to disability status, protected veteran status, race, color, age, religion, national origin, citizenship, marital status, sex, sexual orientation, gender identity or expression, pregnancy or genetic information. SMX is an Equal Opportunity/Affirmative Action employer including disability and veterans.
        Vaccination within 60-days of hire, or an approved accommodation, is a requirement of the position per Executive Order 14042 (unless precluded by State law). If a candidate is not vaccinated, they may request an accommodation once offered the position, and the accommodation must be granted prior to the employee starting in the position. Candidate will have 60 days to get vaccinated.
        Selected applicant will be subject to a background investigation.

Keywords: SMX Corporation, Salem , Senior Site Reliability Engineer (SRE) (2157), Professions , Salem, Oregon

Click here to apply!

Didn't find what you're looking for? Search again!

I'm looking for
in category

Log In or Create An Account

Get the latest Oregon jobs by following @recnetOR on Twitter!

Salem RSS job feeds