Every time an object is written, the system inserts a message into an Apache Kafka message queue (https://kafka.apache.org/). This way further processing can (asynchronously) happen on the freshly written data. This is called the “Data Pipeline” and is key in modern big data architectures. The goal of the internship is to set up a Kafka cluster and a stream processing framework to produce a suite of interesting post-processing applications. Depending on your interests and prior experience we can choose from the following areas:
You will become familiar with cloud industry protocols such as the Amazon S3 API and open-source projects (Apache Kafka, stream processing) as well as build valuable coding, prototyping and debugging experience of distributed and cloud-based applications.
Create a demo that we can show to our customers to demonstrate the Data Pipeline
Upload your CV or send an e-mail to email@example.com!
"De grootste uitdaging is de schaal. Tijdens mijn studies werden we voorbereid op het werken in groep aan grotere projecten, maar al dit verbleekt bij wat we doen bij Western Digital."