EntertainmentCareers.Net

Lead DevOps Engineer - Online Inference

Paramount

New York, NY

Posted: April 2, 2026 More from this Employer Search more jobs like this

MEMBER INSIGHTS Days: ••• ••• views ••• applications Log in to see stats

play_circle

Tip of the Week What does it mean when a posting says its confidential? (Video)

work_off

This position has been filled This was removed by the employer on 5/26/2026 3:04:00 PM PST.

This is a Full Time Job

#WeAreParamount on a mission to unleash the power of contentâ€¦ you in?
Weâ€™ve got the brands, weâ€™ve got the stars, weâ€™ve got the power to achieve our mission to entertain the planet – now all weâ€™re missing isâ€¦ YOU! Becoming a part of Paramount means joining a team of passionate people who not only recognize the power of content but also enjoy a touch of fun and uniqueness. Together, we co-create moments that matter – both for our audiences and our employees – and aim to leave a positive mark on culture.

We are looking for a Lead DevOps Engineer - Online Inference to join our Applied Intelligence Personalization Team. This role will focus on building and maintaining scalable, low-latency infrastructure to support real-time machine learning inference for engagement and personalized messaging. The ideal candidate will have 2 years of experience working with Kubernetes, CI/CD pipelines, and cloud-based infrastructure to optimize and deploy real-time ML models.

Your Day-to-Day:

Design, implement, and manage scalable and reliable infrastructure for online inference services.

Optimize Kubernetes-based deployments for low-latency model serving and real-time personalization.

Automate CI/CD pipelines to streamline the deployment of ML models and services.

Develop observability and monitoring solutions using tools like Prometheus, New Relic, and OpenTelemetry.

Ensure high availability, security, and performance of real-time inference APIs.

Work with ML engineers and backend teams to integrate inference models efficiently into production.

Implement autoscaling strategies for inference workloads based on traffic patterns and model demand.

Manage Pub/Sub and event-driven architectures to enable real-time messaging and engagement analytics.

Optimize model-serving infrastructure using Redis, Memcached, and other caching strategies.

Debug and tackle production issues related to latency, scaling, and reliability.

Key Projects:

Build and optimize real-time inference infrastructure for collaboration and personalization use cases.

Develop scalable and secure CI/CD pipelines for deploying ML models in production.

Implement log aggregation and monitoring solutions for observability and performance tracking.

Optimize Kubernetes-based model serving for minimal latency and efficient resource utilization.

Improve A/B testing infrastructure to track the impact of personalized messaging.

Enhance streaming data pipelines to support real-time inference updates.

Basic Qualifications
4 years of experience in DevOps, Site Reliability Engineering (SRE), or Cloud Infrastructure Engineering.

Solid experience with Kubernetes and container orchestration.

Hands-on experience with CI/CD tools such as GitHub Actions, Jenkins, and ArgoCD.

Experience working with real-time inference and ML model deployment.

Deep knowledge of Google Cloud Platform (GCP), AWS, or Azure.

Expertise in infrastructure as code (IaC) using Terraform or Helm.

Experience with message queues and event-driven architectures (Pub/Sub, Kafka, etc.).

Proficiency in monitoring and logging solutions (New Relic, Prometheus, OpenTelemetry, etc.).

Deep scripting skills in Python, Bash, or Go for automation.

Additional Qualifications
Hands-on experience with ML model serving frameworks (TensorFlow Serving, Triton, TorchServe, etc.).
Familiarity with load balancing, API gateways, and caching strategies.
Understanding of A/B testing frameworks and experimentation analysis.
Experience optimizing low-latency microservices for ML-based personalization.
Passion for building and maintaining high-performance infrastructure for real-time applications.

Per your acceptance of our Terms of Use, if you aggregate, display, copy, duplicate, reproduce, or otherwise exploit for any purpose any Content (except for your own Content) in violation of these Terms without EntertainmentCareers.Net's express written permission, you agree to pay EntertainmentCareers.Net three thousand dollars ($3,000) for each day on which you engage in such conduct.#4/2/2026 8:54:16 AM