Spark: Difference between revisions

From air
Jump to navigation Jump to search
No edit summary
Line 16: Line 16:
tar xvf spark-1.5.1.tgz
tar xvf spark-1.5.1.tgz
cd spark-1.5.1
cd spark-1.5.1
more READ.md
more README.md
build/mvn -DskipTests clean package
build/mvn -DskipTests clean package
</pre>
</pre>

Revision as of 14:50, 3 October 2015

http://spark.apache.org/

Apache Spark™ is a fast and general engine for large-scale data processing. Write applications quickly in Java, Scala, Python, R. Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python and R shells. Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.

Liens

Livres

code https://github.com/sryza/aas


Installation (depuis un Mac)

wget http://apache.crihan.fr/dist/spark/spark-1.5.1/spark-1.5.1.tgz
tar xvf spark-1.5.1.tgz
cd spark-1.5.1
more README.md
build/mvn -DskipTests clean package

Programmation interactive en Scala

./bin/spark-shell
scala> sc.parallelize(1 to 1000).count()

Programmation interactive en Python

./bin/pyspark
>> sc.parallelize(range(1000)).count()



./bin/run-example SparkPi

MASTER=spark://host:7077 ./bin/run-example SparkPi

./dev/run-tests