Over the last six years, I’ve made a living from modeling, primarily in New York City but also in Seattle and various countries worldwide. While I enjoy dressing up in NYC’s trendiest streetwear, by “modeling,” I mean predictive modeling, a central element of the applied data science I’ve done throughout my career.
I started my data science journey at Custora, a New York-based company that built predictive marketing software for eCommerce businesses. After Custora was acquired by Amperity, I joined their data science team, where I’m now a lead data scientist. Amperity is partially remote, which has allowed me to work from anywhere with solid Wi-Fi and even more solid espresso.
My experience in data science has been focused on building machine-learning pipelines as part of enterprise software. Such work relies more heavily on engineering prowess, infrastructural considerations, and high performance both computationally and efficaciously (I had to look that one up.) It also requires regular coordination between data scientists, machine learning engineers, back-end engineers, and product managers.
[Also read Analytics director: A day in the life. ]
Building machine learning pipelines is a large undertaking – I’ve worked on projects that took anywhere from two months to two years from beginning to end. Still, each project follows a similar pattern, from ideation to prototyping to delivery.
The rhythms of data science projects
There is no “typical” day in data science. Yet there are common stages to a project that require me to flex different muscles, take on different responsibilities, and interact with different stakeholders.
[ Related read Data science vs. machine learning: What's the difference? ]
Since our work takes place on such a large scale, in the interest of productivity, we must break that work down into more digestible pieces. If we have a two-month goal, we’ll split that into eight two-week sprints, each with its milestones. We then use tickets to break apart those milestones into specific code contributions.
3 phases of a data science project
Any given project usually has three main phases (the score here is proportional to the maturity of the project and also indicates the number of customers involved in development):
1. The 0-1 phase
This consists of building a predictive model in an ad hoc environment that can predict a quantity of interest for one of our customers. This phase requires interacting with customer-facing teams, product management, and back-end engineers to acquire the right data assets for modeling, perform data QA, understand the problem space, and ensure we’re experimenting in the right direction. At this stage, we also do a lot of brainstorming, literature review, and algorithm prototyping to explore the landscape of the solution space.
2. The 1-5 phase
This phase involves taking a working ML prototype and generalizing it across multiple customers. To do this, we must think infrastructurally about how to train and evaluate our model efficiently and accurately. We must also consider what types of feature engineering and algorithmic techniques will allow us to scale.
3. The 5-100 phase
This is when we take a promising ML pipeline and prepare it for global release in our software. This requires extensive unit and integration testing, prediction stability monitoring, and documentation. We also evaluate and modify development to optimize cost of goods sold (COGS) and operational reliability. During this period, we’ll interface with teams outside of engineering, including product marketing and sales, to educate the rest of the organization on our new capabilities.
[ What is a 'day in the life' like in your role? If you'd like to participate in this series, reach out here! ]
While no two days are alike, our team has implemented some rituals to make sure we get to see each other and make each other laugh and talk about how good the TV show Severance is. While it’s almost blasphemous for an engineer to say this, I enjoy meetings because of the structure and change of pace they bring to the day, so I often schedule colleague check-ins throughout the week. We also have on-call rotations to address sudden issues with our predictive pipelines – operational reliability is a shared responsibility across the engineers and data scientists on our team.
Each data scientist on our team covers an incredibly large surface area – data acquisition and QA, exploratory data analysis, feature engineering, algorithm development, model evaluation, and delivery. In this sense, we often say we’re looking for “unicorns” to join our team, applied data scientists with the skills and interest to excel and bring innovation to the components across that surface area.
When we look for entry-level data scientists, more than anything else, we’re looking for that curiosity and the ability to be a self-starter. If you’re willing to soak up as much information as you can over your first few months, work consistently on your coding skills, and then dive into complicated, interdisciplinary problems, you might be a good fit for a data science team.
[ Check out essential career advice from 37 award-winning CIOs! Get a variety of insights on leadership, strategy, and career development from IT executives at Mayo Clinic, Dow, Aflac, Liberty Mutual, Nordstrom, and more: Ebook: 37 award-winning CIOs share essential IT career advice. ]