JPMorgan Chase & Co. (NYSE: JPM) is a leading global financial services firm with assets of $2.6 trillion and operations worldwide. The firm is a leader in investment banking, financial services for consumers and small business, commercial banking, financial transaction processing, and asset management. A component of the Dow Jones Industrial Average, JPMorgan Chase & Co. serves millions of consumers in the United States and many of the worlds most prominent corporate, institutional and government clients under its J.P. Morgan and Chase brands. Information about JPMorgan Chase & Co. is available at http://www.jpmorganchase.com/.
Global Technology Infrastructure (GTI) is the technology infrastructure organization for the firm, delivering a wide range of products and services and partnering with all lines of business to provide high quality service delivery, exceptional project execution and financially disciplined approaches and processes in the most cost effective manner.
Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. This position is for a Site Reliability Engineer responsible for the development and implementation of processes necessary to improve application / system reliability along with operational support. The position would comprise of approximately equal focus on both software development and Infrastructure operation disciplines. This position will also develop software to automate operational processes along with coding for the shared engineering backlog deliverables.
Engage with the development team throughout the life cycle to help build for reliability
Develop software to automate manual operational work
The workload for the position is multifaceted and would include: Application support (Puppet/CFE), Operating Systems Support (Linux), and related Software Development
Run, maintain and improve the service against established Service Level Objectives by applying software engineering principles
Responsible for the availability, performance, change management, monitoring, and capacity management of their services
Troubleshoot priority incidents, conduct post-mortems and ensure permanent closure of the incidents
Analyze patterns of production incidents, develop permanent remediation plans, and implement automation to prevent future incidents from occurring through software engineering
Manage the efforts to split between manual operational work and engineering work
Work with partner organizations and vendors to provide solutions to current business issues
Participate in a shift model covering 24x7x365 support
Bachelor of Science degree or equivalent experience
7+ years experience on Linux platforms (REDHAT is preferred)
3+ years working with configuration management and automation tools (CFE/CHEF/PUPPET)
5+ year's of scripting/software experience (bash, python, java and perl)
Basic knowledge of database technologies (ORACLE/MySQL, etc..)
Strong understanding of all LINUX security best practices
RHCE and Puppet certification is preferred
Extensive experience in application/system/network performance and availability monitoring (Tivoli, Nagios, Splunk, etc..)
Solid knowledge of APACHE/Weblogic and MQ
Working knowledge of Cloud Engineering. Private and Public Cloud.
Proven technical leadership experience, including the ability to quickly understand an issue, appropriately / efficiently troubleshoot to detailed levels and direct swift resolution
Ability to adapt to a dynamic work environment
Strong ability to take ownership of issues and drive resolution across teams
Assertive personality and drive improvement across environment
Effective written and verbal communication skills
Ability to develop strong client relationships and partner with technology engineering teams.