Spark
Jump to navigation
Jump to search
Apache Spark™ is a fast and general engine for large-scale data processing. Write applications quickly in Java, Scala, Python, R. Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python and R shells. Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.
Liens
Livres
- Advanced Analytics with Spark : Patterns for Learning from Data at Scale, http://shop.oreilly.com/product/0636920035091.do
code https://github.com/sryza/aas
Installation (depuis un Mac)
wget http://apache.crihan.fr/dist/spark/spark-1.5.1/spark-1.5.1.tgz tar xvf spark-1.5.1.tgz cd spark-1.5.1 more README.md build/mvn -DskipTests clean package
Programmation interactive en Scala
./bin/spark-shell scala> sc.parallelize(1 to 1000).count()
Programmation interactive en Python
./bin/pyspark >> sc.parallelize(range(1000)).count()
./bin/run-example SparkPi MASTER=spark://host:7077 ./bin/run-example SparkPi ./dev/run-tests