Logo Talent In Vlaanderen
  • Online jobbeurs 24/7 beschikbaar
  • Ontdek welk bedrijf echt bij jou past
  • Als kandidaat heb je zelf de touwtjes in handen

Implement an ActiveScale “Data Pipeline”

  • Internship
  • Data Pipeline
  • Python, C++, Java

Description

Every time an object is written, the system inserts a message into an Apache Kafka message queue (https://kafka.apache.org/). This way further processing can (asynchronously) happen on the freshly written data. This is called the “Data Pipeline” and is key in modern big data architectures. The goal of the internship is to set up a Kafka cluster and a stream processing framework to produce a suite of interesting post-processing applications. Depending on your interests and prior experience we can choose from the following areas:

Images

  • Uploaded image could be resized, auto-enhanced, filtered, ... The resulting artifacts would be reuploaded to the object store as auxiliary objects
  • Feed to an image recognition algorithm (self-written or in the cloud) to categorize, tag, … the content and push the results to an external database/tool

Video

  • Transcode, post-process, … uploaded video and reupload as additional object
  • Feed audio to a speech recognition algorithm (self-written or in the cloud) to autogenerate subtitles/transcripts

Metrics

  • Compute & visualize system statistics (average, histograms, percentiles, …) and metrics on object name and data size, object lifetime, capacity use per bucket, …

Blockchain

  • Name + MD5sum could be fed to blockchain / merkle tree to do some sort of ‘digital notarization’

Other

  • If you’re passionate about an interesting application, that’s even better.

You will become familiar with cloud industry protocols such as the Amazon S3 API and open-source projects (Apache Kafka, stream processing) as well as build valuable coding, prototyping and debugging experience of distributed and cloud-based applications.

Technology

  • Programming language of your choice: Python, Java, C++, Go, …
  • AWS S3 API
  • Apache Kafka

Goal

Create a demo that we can show to our customers to demonstrate the Data Pipeline

Practical

  • 6-week internship
  • Between July & September, you can choose when.
  • Degree: Master Of Science - Computer Engineering.

Interested?

Upload your CV or send an e-mail to recruiter@amplidata.com!

"De grootste uitdaging is de schaal. Tijdens mijn studies werden we voorbereid op het werken in groep aan grotere projecten, maar al dit verbleekt bij wat we doen bij Western Digital."

Lees hier meer over de stage van Brecht bij Western Digital