Apache Flink

https://flink.apache.org/

''Apache Flink® is an open source platform for distributed stream and batch data processing. Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams.''

=Getting started=

Installation
wget http://www.apache.org/dyn/closer.lua/flink/flink-1.1.2/flink-1.1.2-bin-hadoop27-scala_2.11.tgz tar xf flink-1.1.2-bin-hadoop27-scala_2.11.tgz FLINK_HOME=~/flink-1.1.2 cd $FLINK_HOME ls bin ls examples

Local Execution
Terminal 1: start Flink cd $FLINK_HOME bin/start-local.sh

Open the UI http://localhost:8081/#/overview

Run the SocketWindowWordCount example (source).

Terminal 2: Start netcat nc -l 9000

Terminal 3: Submit the Flink program: cd $FLINK_HOME bin/flink run examples/streaming/SocketWindowWordCount.jar --port 9000

Terminal 2: Add words in netcat input lorem ipsum ipsum ipsum ipsum bye

Terminal 4: cd $FLINK_HOME tail -f log/flink-*-jobmanager-*.out

Terminal 1: stop Flink cd $FLINK_HOME bin/stop-local.sh

Shell
cd $FLINK_HOME bin/start-scala-shell.sh local

TBC

Cluster execution
https://ci.apache.org/projects/flink/flink-docs-release-1.1/quickstart/setup_quickstart.html#cluster-setup

Amazon AWS EMR
Install AWS CLI sudo apt-get install awscli aws help

Configure CLI with AWS credential (link) aws configure

NB : credential file is ~/.aws/credentials and config file is ~/.aws/config

Create an cluster on AWS EMR (Elastic Map Reduce) in your AWS console (link).

The nodes of the EMR cluster are listed in the AWS EC2 panel of your AWS console.

Connect to Master node ssh -i ~/.ssh/awskey.pem hadoop@ec2-52-12-35-67.eu-west-1.compute.amazonaws.com