
Senior Lead Data Engineer, Content Engineering
Paramount
New York, NYThis is a Full Time Job
#WeAreParamount on a mission to unleash the power of content… you in?
We’ve got the brands, we’ve got the stars, we’ve got the power to achieve our mission to entertain the planet – now all we’re missing is… YOU! Becoming a part of Paramount means joining a team of passionate people who not only recognize the power of content but also enjoy a touch of fun and uniqueness. Together, we co-create moments that matter – both for our audiences and our employees – and aim to leave a positive mark on culture.
Overview
We are hiring a Senior Lead Data Engineer to build and scale the data foundations that power Paramount’s next-generation personalization systems across Home, Search/Browse, Notifications, and Artwork. This role sits at the core of the Content Engineering vertical, partnering closely with Applied ML, ML Platform, and Causal Science teams to deliver highly reliable, ML-ready data at global scale. You will design and operate pipelines processing billions of daily events, petabyte-scale feature stores, and real-time engagement streams that support ranking and recommendations. This is a high-impact role for an engineer who thrives in distributed systems, large-scale ETL/streaming, and delivering production-grade infrastructure aligned with cutting-edge personalization.
Why This Role Matters
Paramount is investing heavily in a unified personalization operating model. In this role, you will directly shape:
• The Data Backbone: Building the core of our personalization ecosystem.
• The User Experience: Defining the feature sets that identify what millions of users view.
• Innovation Velocity: Enabling ML teams to innovate quickly and safely through high-quality experimentation data.
Key Responsibilities
• Build & Operate Large-Scale Feature Pipelines: Design and maintain batch/streaming pipelines (Spark, Flink, Databricks, Airflow) producing ML features for ranking models.
• Ensure Point-in-Time Correctness: Develop feature sets that enable unbiased offline training and credible online inference.
• Develop Embedding & Content Pipelines: Build scalable workflows for metadata, imagery, and multimodal representations; partner with Science teams to operationalize new models.
• Architect Data Foundations: Design Delta/Parquet data models and medallion layers, optimizing storage layout and partitioning for latency and cost.
• Real-Time Engineering: Build Kafka-based systems for real-time features and user-activity aggregations, ensuring robust handling of out-of-order events and exactly-once semantics.
• Governance & Leadership: Define data quality rules and schema evolution processes while collaborating across ML pods to translate model needs into infrastructure.
Basic Qualifications
• 7 years of experience in large-scale data or software engineering.
• Hands-on Expertise: Deep experience with Spark (PySpark/Scala), Databricks, Airflow, and Kafka.
• ML Data Modeling: Proficiency in feature pipelines, temporal joins, and mitigating training-serving skew.
• Cloud Ecosystems: Experience with AWS/Azure/GCP and high-performance engines like Snowflake or Redshift.
• Technical Foundations: Proficient programming skills in Python and SQL with a focus on performance optimization.
Additional Qualifications
• Experience in personalization domains (search, ranking, or recommender systems).
• Experience supporting petabyte-scale data lakehouses or feature stores.
• Familiarity with GenAI/RAG systems, multimodal content, or Delta Live Tables.
• Knowledge of Causal Inference, experimentation signals, or ML evaluation workflows.
• Experience with Terraform for governed, repeatable deployments.
What Success Looks Like
In your first 6–12 months, you will:
• Take Ownership: Manage critical feature and content pipelines powering personalization across multiple surfaces.
• Drive Efficiency: Improve feature freshness and reliability while reducing pipeline latency and cost.
• Set Standards: Introduce new monitoring and governance practices that elevate engineering across the AMLG.
• Technical Leadership: Become the go-to expert for distributed systems and ML data infrastructure within Content Engineering.
#LI-PG1