Spark
Jump to navigation
Jump to search
Apache Spark™ is a fast and general engine for large-scale data processing. Write applications quickly in Java, Scala, Python, R. Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python and R shells. Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.
Installation (depuis un Mac)
wget http://apache.crihan.fr/dist/spark/spark-1.5.1/spark-1.5.1.tgz tar xvf spark-1.5.1.tgz cd spark-1.5.1 more READ.md build/mvn -DskipTests clean package
Programmation interactive en Scala
./bin/spark-shell scala> sc.parallelize(1 to 1000).count()
Programmation interactive en Python
./bin/pyspark >> sc.parallelize(range(1000)).count()
./bin/run-example SparkPi MASTER=spark://host:7077 ./bin/run-example SparkPi ./dev/run-tests