My Journey into Data Engineering


Big Data, data-driven, data science, and machine learning are some of the terms that have changed the rules of the game in the society we know.

Today I decided to start creating content to share my experience. It’s a good way to learn together.

One of my first questions was: how do I start? What should my first post be? Well, sharing a bit of my experience in the data world feels like a good starting point.

How did I get into data engineering?

Today Big Data is a popular term for both technical people and the general public, but a few years ago it was newly coined and only a few people were getting into it. In schools and universities it wasn’t a topic people talked about much. It was seen as the evolution of areas like data mining, BI, autonomous systems, and neural networks—more research-focused than production-focused. Following Moore’s law, the ability to process, store, and manage more data moved these areas from research into daily practice.

2013, university days. In one of our courses we had to search for papers to present, which led me to look for a topic related to distributed systems and NoSQL databases. That’s how I found Google’s paper “MapReduce: Simplified Data Processing on Large Clusters.”

After university came professional life and the need to gain experience. Big Data was only mentioned in a few communities, as something niche, so it wasn’t a role companies or consultancies were asking for yet.

My first job was in my hometown, Arequipa, while I was still an undergrad. I worked as a web developer doing everything from databases to frontend design, essentially a full-stack role for that time.

Become a member When I graduated, I moved to Lima and started working in Business Analytics, mostly building dashboards with QlikView, Tableau, and small ETLs from sources like CSV, XLS, SAP, SQL Server, and others. I also did some research to build small predictive models and ran into Hadoop again, plus a framework that promised to be 100x faster than Hadoop (MapReduce): Apache Spark. Little by little I explored more Big Data topics and learned about roles like Data Engineer, Data Science, and Data Architect.

I later worked as a Database Analyst for over a year, focused on Oracle and SQL (or PL/SQL), while keeping the dream of entering Big Data as a Data Engineer. One problem with working on a single technology is that you disconnect from the rest of the ecosystem and its evolution. It’s important to keep studying, stay current, and build personal projects.

Another difficulty: most jobs require a certain amount of experience. So where do you get that experience? Time passed, but the dream remained. I took Big Data preparation courses, Hadoop infrastructure, and more. To go all-in, I completed a remote master’s degree in Big Data and Data Science from a university in Spain. At the same time I attended workshops and met people I added on LinkedIn, expanding my network. By chance I met a colleague from the same consultancy who led a Big Data project. I told him about my desire to become a Data Engineer and my willingness to learn. Fortunately they were starting a trainee program, invited me to participate, and that’s how I started in this exciting world of Big Data.

Press enter or click to view image in full size

Photo by Campaign Creators on Unsplash My first project was for Banco de Crédito del Perú (BCP) as part of the Data Lake team, working on data ingestion.

After BCP, I joined the Digital Architecture team for a Telefónica del Perú project focused on Fast Data (data streaming). Working with continuous data flow is a big challenge—and fun. In that project I used technologies like Spark, Hadoop, Kafka, NiFi, Azure, Elasticsearch, Kibana, and Hortonworks.

Later I joined the Internet para Todos (IPT) team, which is 100% cloud. In this new challenge I focused my Data Engineer skills on cloud and real-time scenarios, building data pipelines in Apache Flink with time windows and integrating them with Google Cloud services. In that project I used tools and services like Apache Flink, Apache NiFi, Apache Kafka, Apache Superset, Google Cloud, BigQuery, Composer, Pub/Sub, Cloud Functions, and more.

Another of my passions is sharing knowledge and helping people who are just starting in this world. I know how hard it is at the beginning and how important it is to have guidance. I started forming small communities, study groups, and knowledge groups around proof of concepts (POCs), mentoring, and any opportunity that came up.

I don’t want to finish without saying that every challenge is a new opportunity to learn and improve our skills.

From my experience, I can say we should never lose the desire to learn, experiment, set goals, and take risks by leaving our comfort zone.