Apache Flink

From air
Jump to navigation Jump to search

https://flink.apache.org/

Apache Flink® is an open source platform for distributed stream and batch data processing. Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams.

Getting started

Installation

wget http://www.apache.org/dyn/closer.lua/flink/flink-1.1.2/flink-1.1.2-bin-hadoop27-scala_2.11.tgz
tar xf flink-1.1.2-bin-hadoop27-scala_2.11.tgz
FLINK_HOME=~/flink-1.1.2
cd $FLINK_HOME
ls bin
ls examples


Local Execution

Terminal 1: start Flink

cd $FLINK_HOME
bin/start-local.sh

Open the UI http://localhost:8081/#/overview

Flink UI

Run the SocketWindowWordCount example (source).

Terminal 2: Start netcat

nc -l 9000


Terminal 3: Submit the Flink program:

cd $FLINK_HOME
bin/flink run examples/streaming/SocketWindowWordCount.jar --port 9000


Terminal 2: Add words in netcat input

lorem ipsum
ipsum ipsum ipsum
bye


Terminal 4:

cd $FLINK_HOME
tail -f log/flink-*-jobmanager-*.out


Terminal 1: stop Flink

cd $FLINK_HOME
bin/stop-local.sh


Shell

cd $FLINK_HOME
bin/start-scala-shell.sh local


TBC

Cluster execution

https://ci.apache.org/projects/flink/flink-docs-release-1.1/quickstart/setup_quickstart.html#cluster-setup


Amazon AWS EMR

Install AWS CLI

sudo apt-get install awscli
aws help

Configure CLI with AWS credential (link)

aws configure

NB : credential file is ~/.aws/credentials and config file is ~/.aws/config



Create an cluster on AWS EMR (Elastic Map Reduce) in your AWS console (link). File:AWS EMR Dashboard.png

The nodes of the EMR cluster are listed in the AWS EC2 panel of your AWS console.


Connect to Master node

ssh -i ~/.ssh/awskey.pem hadoop@ec2-52-12-35-67.eu-west-1.compute.amazonaws.com