Data Engineer, ML

We're looking for a

Data Engineer, ML

About Cosmos

Cosmos is building the most inspiring place on the internet: a visual discovery platform where people save the images, products, and ideas they care about, search by color or AI caption, and share collections with attribution built in. We're a Series A startup ($22M raised) building a new home for artists and creators, growing from 2M users toward 10M and beyond. Every save, search, and scroll is an event, so the data per person runs deep, and the recommendation and search systems that power discovery run on what this team builds.

About the role

This is the senior-most data engineering role at Cosmos. You own the data foundation for ML, recommendation, analytics, and product, and operate as the DRI for the data stack end to end. This is a hands-on individual-contributor role.

Your job is to make Cosmos's data reliable, trusted, and easy to use. That means building scalable pipelines for product events, ML training data, features, and analytics, owning what keeps data dependable as volume grows (quality, monitoring, contracts, backfills), and standing up the internal tooling the rest of engineering depends on.

You work closely with the data side of our ML infrastructure: feature generation, training data, and the online recommendation systems. As the company grows, you'll also help define the long-term data architecture.

Our stack: BigQuery, Pub/Sub, Dataflow, and Dataform on GCP; Chalk for features; Artie for replication; Postgres, RabbitMQ, and Kubernetes in the application layer; Terraform for infrastructure; Python throughout.

What you'll achieve

Own and build Cosmos's data stack end to end as its DRI, from event ingestion through to the tables and features the models and dashboards run on, across batch and streaming (Pub/Sub, Dataflow, Artie), and shape the long-term data architecture as the company grows.
Design and run the feature and label pipelines the recommendation and search models train and serve on, with consistency between offline training and online serving.
Build the warehouse models and transformations (Dataform) that analytics, experimentation, and product rely on.
Own data quality: monitoring, data contracts, and reliable backfills, so bad data never reaches a model or a dashboard.
Partner closely with ML, backend, product, and mobile engineering. You'll often be the one defining the approach in ambiguous spaces

Minimum qualifications

7+ years building and operating production data pipelines at scale, batch and streaming.
You have owned a data platform or a major part of one and run it as the responsible engineer.
Deep with a modern data stack: a cloud warehouse (BigQuery, Snowflake, or similar), a stream and batch processing engine (Dataflow, Spark, or similar), and a transformation framework (Dataform, dbt, or similar).
Set up data contracts, monitoring, and backfill processes that held up as data volume grew.
Fluent in Python and SQL, and you hold a high bar for code quality and review.
You have built the data layer that ML or analytics systems run on and understand what those systems need from it.
You define the approach in ambiguous problems and drive outcomes without waiting for direction.

Nice to have

Built feature stores, or feature and label pipelines for recommendation, ranking, or search at scale (Chalk, Tecton, or similar).
Deep with the GCP data stack (BigQuery, Pub/Sub, Dataflow, Dataform).
Worked on event instrumentation or product analytics (Mixpanel or similar) and understand how event design shapes everything downstream.
Have been an early or founding data engineer before.
Comfortable with the infrastructure around the data: Postgres, Kubernetes, Terraform.

How we work

This role is based in one of our two engineering hubs, New York City or Warsaw. It is not remote and requires 3-4 days in person.

The wider engineering team also spans Copenhagen and Boston, so clear written communication and real time-zone overlap matter.

Benefits

Premium health, dental, and vision
20 days PTO, plus company holidays, plus a 2-week winter break
Top-spec MacBook Pro, Apple Studio Display
Monthly stipend for software and tools
Monthly team events
Fully covered commute in NYC
Daily lunch & dinner stipend at the office
Optional Superpower membership

We're always looking for

curious minds to join our team.

Apply now