Apache Flink® is an open source platform for distributed stream and batch data processing. Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams.

Getting started

Installation

wget http://www.apache.org/dyn/closer.lua/flink/flink-1.1.2/flink-1.1.2-bin-hadoop27-scala_2.11.tgz
tar xf flink-1.1.2-bin-hadoop27-scala_2.11.tgz
FLINK_HOME=~/flink-1.1.2
cd $FLINK_HOME
ls bin
ls examples

Local Execution

Terminal 1: start Flink

cd $FLINK_HOME
bin/start-local.sh

Open the UI http://localhost:8081/#/overview

Flink UI

Run the SocketWindowWordCount example (source).

Terminal 2: Start netcat

nc -l 9000

Terminal 3: Submit the Flink program:

cd $FLINK_HOME
bin/flink run examples/streaming/SocketWindowWordCount.jar --port 9000

Terminal 2: Add words in netcat input

lorem ipsum
ipsum ipsum ipsum
bye

Terminal 4:

cd $FLINK_HOME
tail -f log/flink-*-jobmanager-*.out

Terminal 1: stop Flink

cd $FLINK_HOME
bin/stop-local.sh

Shell

cd $FLINK_HOME
bin/start-scala-shell.sh local

TBC

Cluster execution

https://ci.apache.org/projects/flink/flink-docs-release-1.1/quickstart/setup_quickstart.html#cluster-setup

Amazon AWS EMR

Install AWS CLI

sudo apt-get install awscli
aws help

Configure CLI with AWS credential (link)

aws configure

NB : credential file is ~/.aws/credentials and config file is ~/.aws/config

Create an cluster on AWS EMR (Elastic Map Reduce) in your AWS console (link).

EMR Dashboard

The nodes of the EMR cluster are listed in the AWS EC2 panel of your AWS console.

Connect to Master node

ssh -i ~/.ssh/awskey.pem hadoop@ec2-52-12-35-67.eu-west-1.compute.amazonaws.com

Apache Flink

Contents

Getting started

Installation

Local Execution

Shell

Cluster execution

Amazon AWS EMR

Navigation menu

Apache Flink

Getting started

Installation

Local Execution

Shell

Cluster execution

Amazon AWS EMR

Navigation menu

Search