Job has been saved to your Account Portal!

SRE Engineer

Job Description

The Senior Site Reliability Engineer(SRE) has a strong technical background in multiple engineering disciplines. This position will interface closely with a tight-knit team of engineers across a broad range of technical areas that provide managed services to internal global customers. The ideal candidate will have a balance of breadth across engineering disciplines, depth and expertise in select areas, and practical real-life experience solving very complex problems while maintaining patience and professionalism in the most critical moments. The incumbent operates independently on complex assignments involving the analysis of both business and regulatory requirements, as well as the analysis of the individual technical implementations to maximize benefit to the business. This role requires an in depth knowledge of Active Directory, Federation, Linux, Storage, VMware ESX, Windows Server, and related system technologies. As a technical subject matter expert, you will mentor system engineers, review existing and implement new solutions to meet business objectives. The incumbent will act as an internal team escalation point for systems requests and issues. Travel up to 15% maybe be required based on project needs.    Job Responsibilities • Design & deploya robust monitoring/alerting strategy, defining & implement self-healing capabilities, creation & updating of automated runbooks/playbooks, triaging & solutioning production incidents. • Subject Matter Expert on SolarWinds NPM, SAM, and NCM, and growing and maintaining the SolarWinds application infrastructure Work with the Network/Systems/Applications teams. • Support to troubleshoot and understand system faults and application performance issues to design monitoring capabilities that can detect and auto correct.• Build monitoring, alerting and dashboarding solutions that improve the visibility into our applications' performance and infrastructure metrics and keep operational workload stable. • Use automation to streamline the monitoring of applications and services using scripting and tools. • Good knowledge of Splunk, NewRelic, DataDog, Pingdom, AppDynamics and other monitoring tools. • Tracks issues and business requests, and conducts research on broad-based solutions and new features that meet customer needs • Monitoring application performance usage through the use of APM and other monitoring tools to isolate the fault domain and identify root cause of performance issues .• Facilitate blameless Incident Retrospectives to understand root causes, communicate learnings, determine remediation and make continuous enhancements to monitoring. • Identifying, evaluating, and recommending monitoring tools and diagnostic techniques. Assess gaps in as-is monitoring tool capabilities and recommend tools to augment or replace. • Monitoring support incident queue, investigating and resolving logged 3rd level technical support incidents. • Mentoring / assisting eNOC Support staff with incident diagnosis.  

Qualification

Required Skills/Experience
•8+ years of related enterprise IT experience • Linux and/or Windows system administration experiences • Solid working knowledge of Network –LAN/WAN, DNS, DHCP, Load Balancing, Firewall • Experience in configuring, deploying, and maintaining large SolarWinds environments, 14+ polling engines supporting the following software modules:NPM –Network Performance MonitorNCM –Network Configuration ManagerNTA –NetFlow Traffic AnalyzerSRM –Storage Resource MonitorAPE -Additional Polling EngineVMAN –Virtualization Manager• Hands-on experience performance monitoring & diagnostic type tools (Splunk, NewRelic, DataDog, Pingdom, AppDynamics, etc.) • Knowledge in at least one, preferably several scripting languages (PowerShell, Bash, Perl). • Microsoft Azure, VMWare vCenter, DHCP, DNS, GPOs. • Ability to participate in a 24/7/365on-call rotation • Experience with OpsGenie, ServiceNOW, Jira • Experience with Windows 2012, 2016 & 2019 Server, Office 365 and cloud solutions (Azure/AWS).Desired Skills/Experience • Entry knowledge of building and supporting CI/CD pipelines • Entry knowledge of building Pro-Active end to end Business service monitoring solutions • Azure Certification • SolarWinds Certification • Experience with cloud system fundamentals (Kubernetes, Containers, Virtualization, Automation) and observability techniquesfor these platforms • Ability to demonstrate strategic, data-driven thinking combined with efficient implementation. • Hands-on, self-starter with a positive attitude, strong work ethic, and ability to influence those around, above, or below their level. • Previous Consulting experience.• Experience in calculating system reliability metrics, including RPO, RTO, SLO & SLA, Mean-time-to-restore (MTTR), Mean-time-between-incidents (MTBI)Education•Industry specific training and designation is a plus

About The Specialty Insurance Company

The company is a global leader in providing specialty insurance, reinsurance, and mortgage insurance solutions.

SRE Engineer

Specialty Insurance Company

Quezon City

Visit Profile

Salary

0-0/month

Position Level

Job Level

Entry level

Job Type

Full Time

Hiring Until

07/23/2022