
Lead DevOps Engineer - Online Inference
Paramount
New York, NYThis is a Full Time Job
#WeAreParamount on a mission to unleash the power of content… you in?
We’ve got the brands, we’ve got the stars, we’ve got the power to achieve our mission to entertain the planet – now all we’re missing is… YOU! Becoming a part of Paramount means joining a team of passionate people who not only recognize the power of content but also enjoy a touch of fun and uniqueness. Together, we co-create moments that matter – both for our audiences and our employees – and aim to leave a positive mark on culture.
We are looking for a Lead DevOps Engineer - Online Inference to join our Applied Intelligence Personalization Team. This role will focus on building and maintaining scalable, low-latency infrastructure to support real-time machine learning inference for engagement and personalized messaging. The ideal candidate will have 2 years of experience working with Kubernetes, CI/CD pipelines, and cloud-based infrastructure to optimize and deploy real-time ML models.
Your Day-to-Day:
Design, implement, and manage scalable and reliable infrastructure for online inference services.
Optimize Kubernetes-based deployments for low-latency model serving and real-time personalization.
Automate CI/CD pipelines to streamline the deployment of ML models and services.
Develop observability and monitoring solutions using tools like Prometheus, New Relic, and OpenTelemetry.
Ensure high availability, security, and performance of real-time inference APIs.
Work with ML engineers and backend teams to integrate inference models efficiently into production.
Implement autoscaling strategies for inference workloads based on traffic patterns and model demand.
Manage Pub/Sub and event-driven architectures to enable real-time messaging and engagement analytics.
Optimize model-serving infrastructure using Redis, Memcached, and other caching strategies.
Debug and tackle production issues related to latency, scaling, and reliability.
Key Projects:
Build and optimize real-time inference infrastructure for collaboration and personalization use cases.
Develop scalable and secure CI/CD pipelines for deploying ML models in production.
Implement log aggregation and monitoring solutions for observability and performance tracking.
Optimize Kubernetes-based model serving for minimal latency and efficient resource utilization.
Improve A/B testing infrastructure to track the impact of personalized messaging.
Enhance streaming data pipelines to support real-time inference updates.
Basic Qualifications
4 years of experience in DevOps, Site Reliability Engineering (SRE), or Cloud Infrastructure Engineering.
Solid experience with Kubernetes and container orchestration.
Hands-on experience with CI/CD tools such as GitHub Actions, Jenkins, and ArgoCD.
Experience working with real-time inference and ML model deployment.
Deep knowledge of Google Cloud Platform (GCP), AWS, or Azure.
Expertise in infrastructure as code (IaC) using Terraform or Helm.
Experience with message queues and event-driven architectures (Pub/Sub, Kafka, etc.).
Proficiency in monitoring and logging solutions (New Relic, Prometheus, OpenTelemetry, etc.).
Deep scripting skills in Python, Bash, or Go for automation.
Additional Qualifications
Hands-on experience with ML model serving frameworks (TensorFlow Serving, Triton, TorchServe, etc.).
Familiarity with load balancing, API gateways, and caching strategies.
Understanding of A/B testing frameworks and experimentation analysis.
Experience optimizing low-latency microservices for ML-based personalization.
Passion for building and maintaining high-performance infrastructure for real-time applications.