Data Engineer, ML
We're looking for a
Data Engineer, ML
About the role...
Cosmos is hiring a data engineer to build and own the data foundations behind our recommendation systems in New York City. You'll sit close to our ML team and own the pipelines, schemas, and tooling that turn billions of product events into the training data, features, and analytics that everything else depends on.
This is a high-leverage role on a small team. The systems you build will be used by every ML engineer, data scientist, and product engineer at Cosmos. If the data is wrong, slow, or hard to work with, nothing downstream works. We want someone who takes that seriously and treats data infrastructure as a product.
What you'll do:
Design and own the ETL pipelines that move product events from our application into the warehouse, feature systems, and downstream analytics.
Build data models and schemas that serve three different audiences: ML training, online serving, and business analytics. Each has different latency, freshness, and shape requirements.
Own data quality end to end. Monitoring, alerting, contracts with upstream services, backfills when things go wrong.
Partner with ML and recsys engineers on feature pipelines, training datasets, and the infrastructure that connects offline experimentation to online serving.
Build internal tooling that makes data easier to find, trust, and use. Documentation, lineage, discovery.
Help shape how we think about data at Cosmos as the team grows.
What we're looking for:
Experience in data engineering experience in production environments.
Strong Python and SQL. You can write performant queries against large datasets and you know why your joins are slow.
Production experience with modern data tooling. Some mix of Airflow or Dagster, dbt, Snowflake or BigQuery or Redshift.
Experience with streaming or near-real-time pipelines. Kafka, Kinesis, Flink, or similar.
Comfort with AWS and infrastructure as code.
You've built data systems specifically for ML use cases, not just BI. You understand training/serving skew, feature freshness, point-in-time correctness.
Nice to have:
Experience at a visual discovery, social, or content platform (Pinterest, Instagram, TikTok, etc.)
Background in balancing organic and paid/promoted content distribution
Experience with contextual bandits, reinforcement learning, or online learning systems
Benefits & perks:
Premium health, dental, and vision
20 days PTO, plus company holidays, plus a 2-week winter break
Top-spec MacBook Pro, Apple Studio Display
Monthly stipend for software and tools
Monthly team events
Fully covered commute in NYC
Daily lunch & dinner stipend at the office
Optional Superpower membership
We're always looking for
curious minds to join our team.
