Senior Cloud Engineer
Universal Music GroupNashville, TN
Full Time Job
How we LEAD:
We are currently seeking an exceptional senior system engineer to be a key member of our Operations Center. They will apply their strong server and cloud management skills in a global enterprise to ensure the smooth and efficient ongoing operation of the UMG Global infrastructure. A high level of diverse technical skills for troubleshooting and problem analysis is required, along with the ability to clearly communicate the results of problem analysis to business stakeholders, IT support teams, and network providers to quickly and effectively resolve operational issues. This position requires established and proven experience in a global environment. In addition to having strong technical skills, you must be comfortable in effectively communicating with business end users, technical IT teams, business partners, network providers, and business process outsourced vendors, all while being sensitive to a wide diversity of cultural and technical backgrounds in a global business environment.
How you’ll CREATE:
• Diagnose and solve problems, frequently under time constraints and business pressure
• Provide support and configuration assistance to application teams deploying at public Cloud providers (AWS, Azure, & GCP)
• Provide detailed capacity and performance analysis
• Work with vendors and global/regional IT groups to resolve problems in a timely manner
• Serves as primary escalation point for the UMG Application communities and Sourcing Provider(s) for operational issues, ensuring timely resolution & communication
• Ensure Root Cause Analysis for any critical service issues is completed as appropriate
• Works with internal customers, teammates and 3rd party Providers to ensure operational delivery of the cloud and server infrastructure
• Monitor infrastructure operational alerts e.g. environmental, server performance and devices and escalate as necessary
• Maintain high quality process and procedure documentation
• Maintain and implement high quality operational standards, processes and procedures
• Maintain redundancy, backup and recovery processes, and disaster recovery procedures
• Maintain & enhance knowledge of key technologies
• Mentor and provide knowledge transfer to team members
• Monitor and tune systems to identify and eliminate issues and potential bottlenecks
• Research system vulnerabilities and provide documentation for future use implementing corrective actions as required
• Participate in on-call rotation, and as such, work out of standard business hours will occasionally be required
Bring your VIBE:
• At least six years’ experience split between traditional server management and enterprise Cloud systems (AWS, Azure, Google)
• Knowledge of Chef and Terraform
• Strong knowledge of virtualisation technology (VMware, Hyper-V)
• Must possess strong people skills and the ability to be both diplomatic and firm
• Experience in highly available 24x7 production environment
• Fluency in operating system administration and tools including: Microsoft, Mac OS X, Linux, Python, Powershell, etc.
• Real world experience with scripting (*nix Shell, PowerShell, Python, PHP, etc) to develop automation and reporting
• Practical (hands-on) grasp of LAMP (Linux, Apache, MySQL, PHP) and Microsoft (Windows Server, IIS, MS SQL) technologies
• Full understanding of hypervisors and virtual machines (VMware, Hyper-V, others)
• Proven experience with Amazon AWS and Microsoft Azure (Google Compute a plus) in an enterprise setting
• Solid understanding of key cloud design concepts such as “High Availability” (HA), “Elastic Load Balancing” (ELB), especially in relation to modern disaster recovery (DR) methodologies and cloud-based solution design
• Experience managing large cloud server infrastructure on AWS and/or Azure
• Clear understanding of AWS-specific services and terms (EC2, S3, RDS, AZ, CloudWatch, CloudFormation, CloudTrail)
• Deploying, managing, and operating scalable, highly available, and fault tolerant systems on AWS
• Migrating an existing on-premises application to AWS
• Implementing and controlling the flow of data to and from AWS
• Selecting the appropriate AWS service based on compute, data, or security requirements
• Identifying appropriate use of AWS operational best practices
• Estimating AWS usage costs and identifying operational cost control mechanisms
• Communicate clearly in both the written and verbal mode to technical and non-technical audiences
• Manage time well in a high-interrupt operational environment. Handle the details of several technical tasks simultaneously.
• Customer and service delivery oriented
• Ability to mentor technical operations staff in appropriate technologies, as needed
• Be an excellent and creative problem solver
• MCSE & Unix certifications preferred.
• Exposure to version control and release management concepts and tools, particularly Git (Github / Bitbucket) for code branching and merging
• Familiarity with Cloud Management Platforms (RightScale, Nagios, CliQr, Scalr)
• Experience in service management
• International experience is beneficial. Additional languages a plus.
• Bachelor’s Degree in Computer Science or Engineering or closely related field or comparable education and experience.
• AWS, MCSE, other professional certifications
• ITIL Foundation Certification strongly desired.
• Competitive Compensation Package including Salary, Benefits and Generous 401k Savings Plan
• Paid Time Off – Paid Holidays, “Gift Week”, Summer Fridays
• Student Loan Repayment Assistance
• Employee Developmental Support
• Annual Gym Reimbursement Package
• Pet Insurance, plus much more!
Universal Music Group is an Equal Opportunity Employer.