Platform Systems Engineer
We are currently looking for passionate Platform Systems Engineers who will provide their expertise and resourcefulness in identifying, troubleshooting, and reporting platform problems to developers and stakeholders to ensure that the applications are provided with a stable and reliable service. Identify, troubleshoot, resolve, and escalate incidents quickly and effectively.
Be responsible for the operational monitoring of the platform health.
Be responsible for the platform end-users’ problems.
Interact with Application Developers in sustaining application performance and implement fix/changes/improvements as necessary.
Utilize application logs in debugging reported issues and provide analysis for improvement/resolution.
Develop tools, operational enhancements, and automated solutions
Perform root cause analysis. Identify and resolve underlying problem patterns, while driving to develop automated and self-healing solutions.
Participate in outage conference calls.
Write clear and consumable documentation of the environment and operational procedures.
Be a member of a 24 x7 shifting rotation.
Bachelor’s degree or higher education.
Strong sense of ownership, customer service support, and integrity demonstrated through excellent written/verbal communication.
Ability to work through complex engineering obstacles using debugging and problem-solving skills.
Solid grasp of Linux network and security stack.
Fluency in scripting (ie. Bash, Python, Regex, Ruby on Rails)
Experience / good understanding of containerization (Docker) and container scheduling platforms running in distributed systems (ie. Apache Mesos, Mesosphere, Kubernetes, Openstack).
Strong debugging and troubleshooting skills that span applications, systems, and networking (TCP/IP).
Experience in the usage and administration of Monitoring solutions such as Grafana, Sensu, Nagios, Zabbix, etc.
Knack in using different Analytics and Observability applications such as Splunk, Elasticsearch, Prometheus, etc. in finding issues on different applications and platforms
PLUS POINTS IF YOU HAVE:
Familiarity with distributed systems is a plus including: the CAP Theorem, Microservices, and the Twelve Factor App.
Understanding of Linux kernel space, memory process, threads, static and shared libraries, interprocess communication, and signals.
Experience handling large numbers of diverse systems with configuration management systems like Puppet, Chef, Ansible or Salt.
Experience with CI/CD solutions such as Jenkins, Bamboo, Spinnaker, Artifactory, and Git is a plus.
Experience in working and interfacing with APIs and serialized formatting like JSON and YAML is a plus.
About The IT Solutions Provider
IT Solutions Provider specializes in making data center and cloud operation teams thrive. Our global team of service architects, infrastructure admins and software engineers have built and operated some of the worlds largest, most scalable environments over the past two decades. Our philosophy is about making technology "werk" for our customers by tailoring solutions to their exact needs. We believe that products, services, tools and processes should serve the needs of people, NOT the other way around. From 24/7 monitoring to infrastructure design to application development, we've got you covered.