*** 's Data and Runtime Stability SRE team is trusted to administer the end-to-end environment for *** 's installation of numerous services which support the applications that constitute *** 's line of products. On any given day we're inventing, engineering, developing, building, coding, trouble-shooting and maintaining a wide range of: tools, monitors, frameworks, interfaces, protocols, solutions and best-practices. These components stitch together a robust suite of automated and self-healing systems that manage the services that Data and Runtime provides to the rest of the firm. We improve uptime, provision and balance resources, architect and coordinate operational procedures, administer backup and recovery processes, coordinate maintenance windows, manage replication and oversee workflows.
The Role:
In addition to managing the overall Data and Runtime environment, you'll work directly on installations of technologies that use services such as RabbitMQ, Comdb2, Kafka, Redis and many more; getting to collaborate every day with the application developers that create these applications to integrate the services they provide into the *** operational environment as well as *** products. So, not only will you have high-level-ownership and "the classic SRE responsibilities" such as: system tuning, performance analysis and the management of patches, installations, and upgrades; you'll also have im *** te access to the experts that are designing and coding the *** specific components, APIs and methods. This means insight and entry to the lowest levels of how *** applications interact with each other and the Runtime environment for the purposes of both in-depth troubleshooting and enhancing stability, reliability, performance and feature-set.
We're open to trying new ideas, processes, and technologies. The right applicant will be imaginative, creative, self-motivated, and highly curious as innovation and initiative are highly valued here. Problem-solving, programming, logical frameworks, and Unix systems should all be second nature. We are looking for someone that will continually strive to improve our environment; regularly asking "why?" and saying: "we can make this *** ter!"
You'll need to have:
- 4+ years of programming experience with Python
- A degree in Computer Science, Engineering or similar field of study or equivalent work experience
- 5+ years experience with Unix, Unix tools and shell scripting
- Deep understanding of TCP/IP networking and the OSI model
- Experience designing and automating repeatable processes in a client/server modeled environment
- Experience supporting a highly available production systems
- Ability to build and maintain highly sophisticated, performant, and scalable, critically important systems
- Experience building monitors and alarms for system performance, status and stability
- Experience with CI/CD systems and writing robust unit and system tests
We'd love to see:
- Experience in Rapid framework
- Experience analyzing existing systems and identifying shortcomings with concrete ideas for improvement
- C programming skills
- Experience designing stable, long-lasting APIs
- Experience with Splunk/Humio and Grafana
- Experience with GitHub and JIRA.
*** provides reasonable adjustment/accommodation to qualified individuals with disabilities. Please tell us if you require a reasonable adjustment/accommodation to apply for a job or to perform your job. Examples of reasonable adjustment/accommodation include but are not limited to making a change to the application process or work procedures, providing documents in an alternate format, using a sign language interpreter, or using specialized equipment. If you would prefer to discuss this confidentially, please email [Please reply using Staff Me Up] (Americas), [Please reply using Staff Me Up] (Europe, the Middle East and Africa), or [Please reply using Staff Me Up] (Asia-Pacific), based on the region you are submitting an application for.
This is not an exclusive Staff Me Up job. This partner requires you to apply on its own site.