Spark: Difference between revisions

From air
Jump to navigation Jump to search
(Created page with "http://spark.apache.org/ ''Apache Spark™ is a fast and general engine for large-scale data processing. Write applications quickly in Java, Scala, Python, R. Spark offers...")
 
No edit summary
Line 2: Line 2:


''Apache Spark™ is a fast and general engine for large-scale data processing. Write applications quickly in Java, Scala, Python, R. Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python and R shells. Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application. ''
''Apache Spark™ is a fast and general engine for large-scale data processing. Write applications quickly in Java, Scala, Python, R. Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python and R shells. Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application. ''

=Liens=
* http://spark-packages.org/

=Livres=
* Advanced Analytics with Spark : Patterns for Learning from Data at Scale, http://shop.oreilly.com/product/0636920035091.do
code https://github.com/sryza/aas





Revision as of 14:49, 3 October 2015

http://spark.apache.org/

Apache Spark™ is a fast and general engine for large-scale data processing. Write applications quickly in Java, Scala, Python, R. Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python and R shells. Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.

Liens

Livres

code https://github.com/sryza/aas


Installation (depuis un Mac)

wget http://apache.crihan.fr/dist/spark/spark-1.5.1/spark-1.5.1.tgz
tar xvf spark-1.5.1.tgz
cd spark-1.5.1
more READ.md
build/mvn -DskipTests clean package

Programmation interactive en Scala

./bin/spark-shell
scala> sc.parallelize(1 to 1000).count()

Programmation interactive en Python

./bin/pyspark
>> sc.parallelize(range(1000)).count()



./bin/run-example SparkPi

MASTER=spark://host:7077 ./bin/run-example SparkPi

./dev/run-tests