Senior Site Reliability Engineer (SRE) (2157)
Company: SMX Corporation
Location: Salem
Posted on: March 15, 2023
|
|
Job Description:
Senior Site Reliability Engineer (SRE) (2157)at SMX(View all
jobs) (https://www.smxtech.com/careers/)
United States
SMX is seeking a driven and talented Senior Site Reliability
Engineer (SRE) to join our thriving Cloud Services business unit
and work with some of the best technologists in the market. Senior
Site Reliability Engineers provide senior-level implementation
support services and subject-matter expertise to SMX clients on IT
consulting engagements. This is a remote position supporting a
Herndon, Virginia based team. Using knowledge and experience in
technical architecture and systems integration, our Senior Site
Reliability Engineers are responsible for assisting with the
Technology team deliverables including building of dashboards for
monitoring metrics on top tier apps, the continuous
build/deployment of automation scripts, and maintaining system
configurations across multiple environments hosted on the AWS cloud
tech stack. In addition, our Senior Site Reliability Engineers work
closely with the delivery teams and SMX clients to drive adoption
of modern reliability practices like SLOs, error budget policies,
actionable alerts, incident retrospectives, chaos testing, and
end-to-end ownership, and to prioritize the timely completion and
delivery of these tasks. This individual will bring a passion for
technology, a strong technical skill set, and an ability to deploy,
employ, operate, and sustain Production-ready solutions, software,
and tools for our customers. Our Site Reliability Engineers have
working knowledge of continuous integration models, work directly
with leads and program managers and exhibit an overall willingness
to contribute to the SMX team. This individual will bring
experience in infrastructure and operations automation and will
provide hands-on experience implementing cloud-native, and
automation-centric solutions to drive operation efficiencies with a
strong focus on quality, communication, customer success, and
results.
Essential Duties and Responsibilities:
Implement application/infrastructure observability solutions and
perform maintenance to ensure desired application availability
Real-time service management inclusive of building monitoring for
the golden signal SLIs, establishing, negotiating SLOs with the
business, building alerting, creating playbooks and runbooks for
services in conjunction with development teams, product owners and
support
Apply automation and software to any tasks or parts of the system
that would benefit from it or are performed manually.
Handle Cloud Operations (Events, Incidents, and Requests) based on
a defined, ticket-driven service catalog.
Provide guidance and leadership to the SRE team
Perform internal team technical reviews
Work with customer and SRE team to identify, develop, deploy, and
maintain solutions
Be a primary "face to the customer" during the Manage phase of the
customer lifecycle - communicating clearly and concisely to
identify, triage, remediate, and resolve infrastructure and
solution issues when customer needs are greatest.
Take direction from, and provide clear and timely updates to,
Project Lead or Project Manager
Proactively identify potential operations and reliability issues
and work to resolve
Identify system or performance issues, and develop resolutions
using automation
Identify opportunities for automation and implement them to drive
operational efficiency and cost reduction
Implement and maintain backup and disaster recovery solution for
customers' cloud computing resources
Optimize existing - and identify new opportunities for -
monitoring, logging, and management metrics to improve operational
effectiveness and customer knowledge
Participate in troubleshooting of infrastructure and/or application
related issues
Produce well-written technical project documentation and
operational runbooks
Participate in change management processes
Maintain core working hours but remain flexible to support
after-hours maintenance and escalations (as necessary)
Participate as a team player capable of high performance and
flexibility in a dynamic working environment
Take ownership of issues and act with high sense of urgency when
required
Improve CI/CD tools integration/operations and full automation of
CI/testing
Identify and support Continuous Improvement opportunities to
increase system reliability
Troubleshoot issues with CI/CD pipeline
Deploy and configure cloud services according to best practice
(e.g.: Virtual Machines, Virtual Network, AWS AD, CDN, serverless
functions, DNS, Monitor, Key Vault, Blob storage)
Achieve and maintain AWS certifications
Required Skills:
7+ years of experience in DevOps or SRE
Proven ability to dissect a technical architecture into engineering
plans and discrete tasks
Excellent customer facing skills and the calm professional demeanor
necessary to bolster customer confidence when stress is highest
Ability to work collaboratively with customers
Scripting Experience, Kusto Query Language, Arm Templates,
PowerShell
Strong skillset with AWS Automation, DevOps Pipeline and related
AWS tooling
Collaborate with internal dev team to help end-to-end testing
Solid command of standard CI/CD tools (Terraform, Ansible, Git,
Jenkins, etc.)
Solid experience with container-based deployments using Docker,
working with Docker images, Docker hub and Docker registries.
Installation and configuring Kubernetes and clustering them.
Scripting Experience, Kusto Query Language, Arm Templates,
PowerShell
Proficiency and proven hands-on experience with AWS IaaS and PaaS
Services, AWS Active Directory, and SQL Server Infrastructure.
Experience with AWS Monitoring, Migrate, Log Analytics, AWS SSM,
Load Balancer techniques
Ability to write scripts in JavaScript, Bash, Python, or
similar
Experience in monitoring, metrics collection, and reporting using
open-source tools
Depth of knowledge in security best practices, tools, and
compliance frameworks (NIST, FedRamp, HIPAA, etc.)
Strong written and verbal communication skills
Degree in a technical discipline or additional 6 years' experience
in lieu of degree
Desired Skills / Certs:
BS/BA in Computer Science, Computer Engineering or related field or
equivalent technical experience
Current operations experience within a Cloud Managed Services
Provider (MSP) delivery environment.
One of more of the following certifications are required:
--- AWS Certified Developer - Associate (DVA-C01) --- AWS Certified
SysOps Administrator - Associate (SOA-C02)--- AWS Certified
Solutions Architect - Associate (SAA-C03)--- AWS Certified DevOps
Engineer - Professional (DOP-C01)--- AWS Certified Solutions
Architect - Professional (SAP-C01)--- DevOps Institute: Site
Reliability Engineering Foundation (SREF)
#LI-Remote
#cjpost
Our tradition of delivering innovative, technical solutions dates
back to 1995, however, you may know us better by one of our legacy
company names: Trident Technologies, Smartronix, Datastrong or C2S
Consulting Group. With the support of OceanSound Partners, our
private equity investment sponsor, we began operating as one
business starting in 2019 and became SMX in 2021. We operate in
close proximity to our clients around the globe and have core
locations in Alabama, California, DC Metro, Florida, Hawaii,
Maryland, and Massachusetts.
Today, as SMX, we are one team and together empower government and
commercial enterprises to become more effective, innovative, and
resilient, no matter what challenges they face.
SMX is committed to hiring and retaining a diverse workforce. All
qualified candidates will receive consideration for employment
without regard to disability status, protected veteran status,
race, color, age, religion, national origin, citizenship, marital
status, sex, sexual orientation, gender identity or expression,
pregnancy or genetic information. SMX is an Equal
Opportunity/Affirmative Action employer including disability and
veterans.
Vaccination within 60-days of hire, or an approved accommodation,
is a requirement of the position per Executive Order 14042 (unless
precluded by State law). If a candidate is not vaccinated, they may
request an accommodation once offered the position, and the
accommodation must be granted prior to the employee starting in the
position. Candidate will have 60 days to get vaccinated.
Selected applicant will be subject to a background
investigation.
Keywords: SMX Corporation, Salem , Senior Site Reliability Engineer (SRE) (2157), Professions , Salem, Oregon
Click
here to apply!
|