Spark: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
|||
| Line 16: | Line 16: | ||
tar xvf spark-1.5.1.tgz |
tar xvf spark-1.5.1.tgz |
||
cd spark-1.5.1 |
cd spark-1.5.1 |
||
more |
more README.md |
||
build/mvn -DskipTests clean package |
build/mvn -DskipTests clean package |
||
</pre> |
</pre> |
||
Revision as of 14:50, 3 October 2015
Apache Spark™ is a fast and general engine for large-scale data processing. Write applications quickly in Java, Scala, Python, R. Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python and R shells. Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.
Liens
Livres
- Advanced Analytics with Spark : Patterns for Learning from Data at Scale, http://shop.oreilly.com/product/0636920035091.do
code https://github.com/sryza/aas
Installation (depuis un Mac)
wget http://apache.crihan.fr/dist/spark/spark-1.5.1/spark-1.5.1.tgz tar xvf spark-1.5.1.tgz cd spark-1.5.1 more README.md build/mvn -DskipTests clean package
Programmation interactive en Scala
./bin/spark-shell scala> sc.parallelize(1 to 1000).count()
Programmation interactive en Python
./bin/pyspark >> sc.parallelize(range(1000)).count()
./bin/run-example SparkPi MASTER=spark://host:7077 ./bin/run-example SparkPi ./dev/run-tests