Sr. Systems Engineer
Universal Music GroupWoodland Hills, CA
Full Time Job
We are currently seeking an exceptional Sr Systems Engineer to play a key role in the transition of the UMG infrastructure from a traditional Data Center driven existence to a cloud based one. You will need to be comfortable with ambiguity, and the realities of an enterprise with many systems in the traditional architecture. You will work with the design and build teams to create an operational environment which is consistently nimble at serving the needs of the business by utilizing an ITIL based service management approach.
The responsibilities of this position include the following functions in the Technical Services organization:
Support and Maintenance
• Resolve Support Tickets
• Serves as primary escalation point for the UMG Application communities and Sourcing Provider(s) for operational issues, ensuring timely resolution & communication. This includes utilization of discretion and judgement to ensure correct escalations and resources are engaged to solve problems at hand.
• Ensure Root Cause Analysis for any critical service issues is completed as appropriate. Provide feedback and ensure agreed recommendations are implemented. rd
• Works with internal customers, teammates and 3 party Providers to ensure operational delivery of the cloud and server infrastructure.
• Participate in on-call rotation, and as such, work out of standard business hours will occasionally be required.
• Responsible for keeping the services running 24/7 and putting processes in place to meet or exceed service SLAs
• Monitor infrastructure operational alerts e.g. environmental, server performance and devices. Escalate as necessary within the ITIL framework.
• Ensure compliance with incident, problem, and change management procedures and standards to ITIL standards by all Sourcing Provider(s)
• Utilizing an ITIL based approach for service management, produce management reports and statistics (weekly/monthly/quarterly), evaluating and making suggestions for improvements in the infrastructure. This includes operational reports such as performance, capacity, break/fix, etc.
• Participate in UMG's in scheduled Disaster Recovery activities as required
• Document processes and critical information
• Attend technical project review meetings as needed to ensure project technical designs meet UMG's operational requirements and are in compliance with UMG infrastructure and security standards
• Mentor and provide knowledge transfer to team members
• Ensure controls and compliance process management for all services under their remit, including validation of SAS 70
• At least 5 years' experience in technical server management, MCSE & Unix certification preferred.
• At least 2-years' experience in enterprise Cloud systems (AWS, Azure, Google)
• Must possess strong people skills and the ability to be both diplomatic and firm.
• Experience in highly available 24x7 production environment.
• Fluency in operating system administration and tools including: Microsoft, Mac OS X, Linux, Python, Powershell, etc
• Real world experience with scripting (*nix Shell, PowerShell, Python, PHP, etc) to develop automation and reporting
• Practical (hands-on) grasp of LAMP (Linux, Apache, MySQL, PHP) and Microsoft (Windows Server, IIS, MS SQL) technologies
• Full understanding of hypervisors and virtual machines (VMware, Hyper-V, others)
• Be an excellent and creative problem solver. You do not need to know everything but you do need to know how to find solutions.
• Proven experience with Amazon AWS and Microsoft Azure (Google Compute a plus) in an enterprise setting (personal use of these services will not be considered as equivalent)
• Solid understanding of key cloud design concepts such as ''High Availability'' (HA), ''Elastic Load Balancing'' (ELB), especially in relation to modern disaster recovery (DR) methodologies and cloud-based solution design
• Experience managing large cloud server infrastructure on AWS and/or Azure
• Clear understanding of AWS-specific services and terms (EC2, S3, RDS, AZ, CloudWatch, CloudFormation, CloudTrail)
• Ability to mentor technical operations staff in appropriate technologies, as needed
• Exposure to version control and release management concepts and tools, particularly Git (Github / Bitbucket) for code branching and merging a plus
• Familiarity with Cloud Management Platforms (RightScale, Nagios, CliQr, Scalr) a plus
• Experience in service management
• Communicate clearly in both the written and verbal mode to technical and non-technical audiences
• Manage time well in a high-interrupt operational environment. Handle the details of several technical tasks simultaneously.
• Customer and service delivery oriented
• Monitor and tune systems to identify and eliminate issues and potential bottlenecks
• Research system vulnerabilities and provide documentation for future use implementing corrective actions as required
• Deploying, managing, and operating scalable, highly available, and fault tolerant systems on AWS
• Migrating an existing on-premises application to AWS
• Implementing and controlling the flow of data to and from AWS
• Selecting the appropriate AWS service based on compute, data, or security requirements
• Identifying appropriate use of AWS operational best practices
• Estimating AWS usage costs and identifying operational cost control mechanisms
• International experience is beneficial. Additional languages a plus.
• BA or BS degree in computer science or engineering (or equivalent field)
• AWS, MCSE, other professional certifications
• ITIL v3 Foundation Certification strongly desired.
Universal Music Group is an Equal Opportunity Employer.
This job description only provides an overview of job responsibilities that are subject to change.