Spark

From air
Revision as of 14:47, 3 October 2015 by Donsez (talk | contribs) (Created page with "http://spark.apache.org/ ''Apache Spark™ is a fast and general engine for large-scale data processing. Write applications quickly in Java, Scala, Python, R. Spark offers...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

http://spark.apache.org/

Apache Spark™ is a fast and general engine for large-scale data processing. Write applications quickly in Java, Scala, Python, R. Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python and R shells. Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.


Installation (depuis un Mac)

wget http://apache.crihan.fr/dist/spark/spark-1.5.1/spark-1.5.1.tgz
tar xvf spark-1.5.1.tgz
cd spark-1.5.1
more READ.md
build/mvn -DskipTests clean package

Programmation interactive en Scala

./bin/spark-shell
scala> sc.parallelize(1 to 1000).count()

Programmation interactive en Python

./bin/pyspark
>> sc.parallelize(range(1000)).count()



./bin/run-example SparkPi

MASTER=spark://host:7077 ./bin/run-example SparkPi

./dev/run-tests