Editor’s note: In this ongoing series for IT job hunters, we'll explore in-demand roles, necessary skills, and how to stand out in an interview. Here, Jayaprakash Nair , head of analytics for Altimetrik, shares insights on getting a job as a data engineer.
Data engineer salary range:
$65,000 - $132,000 per year. Source: PayScale.
In a nutshell: What is a data engineer?
A data engineer is responsible for ingesting data from different data sources into a central repository, such as a data lake/warehouse. They are also responsible for setting up the automated pipeline so that data can be brought into the data lake in a regular manner with the least impediments, issues, and data loss as possible.
Data engineers are also responsible for cleaning and organizing the data (data quality and data transformations) to ensure that it becomes the single source of truth. Other roles include adding a layer of accelerators on top of the data – especially if it is big data – so it can more easily be used by downstream consumers, and, in certain cases, cataloging the data.
This role is becoming increasingly critical, not only because of the exponential increase in data (plus related data outside of a company), but also because of an exponential increase in the understanding at the executive level that significant, critical business insights can be mined from this data.
This role is not the same as a data scientist. While a data engineer ingests data from various sources and ensures that it is clean and secure, a data scientist is a consumer of this data. The main responsibility of a data scientist is to unearth valuable nuggets of information from this accumulated data, or perform more advanced predictive analytics.
[ Looking for a data scientist job? See: IT careers: How to get a job as a data scientist ]
What skills are needed?
The skills expected of a data engineer have evolved over time. A few years ago, the main skills were SQL, OLTP, OLAP, Data Warehousing, etc. Then came the era of big data and Hadoop, and the skills expected became HDFS, Hive, Pig and other members of the Apache stack. Now that cloud providers are beginning to get a good grip on the market, knowledge of the managed data services from cloud providers are becoming center stage. These skills are desirable because they aid in the data engineer’s ability to quickly collect relevant data.
How to stand out in a data engineer interview
Everyone can talk about their past job experience. To stand out, you could consider having a distinguished portfolio of solutions/code outside of your work, like in GitHub, for example.
Also, consider participating in technical communities, as many serious headhunters scout around in these areas for prospects.
In an interview, data engineer job seekers should expect questions around SQL (intermediate and advanced). Apart from that, they will likely be asked questions on the pitfalls to keep in mind while designing a modern day data lake, how to ensure good data quality in the lake, AWS/Azure/GCP managed services for data and analytics, data governance, security, and building semantic data marts from the lake.
[ Read also: IT careers: How to job hunt during a pandemic. ]