Revision as of 14:50, 3 October 2015

http://spark.apache.org/

Apache Spark™ is a fast and general engine for large-scale data processing. Write applications quickly in Java, Scala, Python, R. Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python and R shells. Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.

Liens

http://spark-packages.org/

Livres

Advanced Analytics with Spark : Patterns for Learning from Data at Scale, http://shop.oreilly.com/product/0636920035091.do

code https://github.com/sryza/aas

Installation (depuis un Mac)

wget http://apache.crihan.fr/dist/spark/spark-1.5.1/spark-1.5.1.tgz
tar xvf spark-1.5.1.tgz
cd spark-1.5.1
more README.md
build/mvn -DskipTests clean package

Programmation interactive en Scala

./bin/spark-shell
scala> sc.parallelize(1 to 1000).count()

Programmation interactive en Python

./bin/pyspark
>> sc.parallelize(range(1000)).count()


./bin/run-example SparkPi

MASTER=spark://host:7077 ./bin/run-example SparkPi

./dev/run-tests

Spark: Difference between revisions

Revision as of 14:50, 3 October 2015

Contents

Liens

Livres

Installation (depuis un Mac)

Programmation interactive en Scala

Programmation interactive en Python

Navigation menu

Spark: Difference between revisions

Revision as of 14:50, 3 October 2015

Liens

Livres

Installation (depuis un Mac)

Programmation interactive en Scala

Programmation interactive en Python

Navigation menu

Search