So, you want to become a Data Engineer.
You graduated with your Computer Science or Information Technology degree and the prospect of numbers, building data, and organizing them entices you into the world of Data Engineering. If this is the career you are considering pursuing, it’s important you know all the basics – especially the main skills you will need to become a successful Data Engineer.
In this guide:
- What is a Data Engineer?
- What does a Data Engineer do?
- Skills you need as a Data Engineer
What is a Data Engineer?
“A scientist can discover a new star, but he cannot make one. He would have to ask an engineer to do it for him.” - Gordon Lindsay Glegg
A Data Engineer is sort of a crossbreed between a data analyst and a data scientist. They are in charge of managing data workflows, pipelines, and ETL (Extract, Transform, Load) processes, making their role valuable in many organizations since basically every business handles some form of data. Essentially, they need to provide a reliable infrastructure for data that the business can use to properly operate, grow, and improve.
What does a Data Engineer do?
Data Engineers build, develop, test, and maintain architectures such as databases and large-scale processing systems. They ensure the architecture supports business requirements and that the data can be easily obtained when needed and analyzed.
In general, a data engineer collects data relevant to a specific business then move and transform data collected into “pipelines”. They handle the technical aspects of data which is to design, build, and arrange the pipelines.
What a Data Engineer will end up doing exactly depends on the specific requirements of a company.
Skills you need as a Data Engineer
- SQL (Structured Query Language) - to construct queries in order to extract data from the database, you need to know SQL. All modern data warehouses support SQL, such as Amazon Redshift, Oracle, SQL Server, and more.
- Data warehousing and ETL tools – to construct and use a data warehouse. Data warehouses help you aggregate unstructured data from one or multiple sources.
- Apache Hadoop-based analytics - an open-sourced platform for distributed storage and processing of data sets. Knowledge of HBase, Hive, and MapReduce are common requirements for a data engineer.
- Basic language requirement: Python, Java, Scala are the most commonly used languages for data engineering. This is because most tools for storing and processing data are written in these languages. Some examples are Kafka and Spark with Scala; Hadoop, HDFS, Cassandra, HBase, and Hive with Java. C/C++, Perl, and Golang are quite popular too.
- Operating systems knowledge - UNIX, Linux, and Solaris are some operating systems that you need experience with. Many tools you will use are based on these systems because of their demands for root access to hardware and operating system functionality.
- Experience with cloud platforms – knowledge of at least one cloud platform is important for the data engineer of today. Most employers prefer Amazon Web Services, Google Cloud Platform, and Microsoft Azure. It’s a good idea to orient yourself with Amazon EC2, AWS Lambda, Amazon S3, and DynamoDB.
You may also need to know or have experience with:
- Big data tools - for businesses that have big data. These include tools like Spark, Kafka, Hadoop (HDFS, HBase, Hive), and Cassandra.
- Machine learning familiarity - is mostly for Data Scientists but familiarity with machine learning can be valuable to a data engineer. Statistical analysis and basic data modeling are good skills to develop if you want to become a well-rounded data engineer.
What about other skills?
- Logical mind to know what data to extract as well as to be able to understand and arrange data in a manner that is clear and understood
- Organizational skills since a big portion of the job of a data engineer is to organize data
- Ability to work with cross-functional teams
View more skills required for a Data Engineer here or visit our Career Guide for information about other career paths you can take as a Computer/IT graduate.