Transforming data in Spark

Posted on 2 February 2016 by bartek

The core ability of Spark is to operate on data that is distributed in the cluster (RDDs aka Resilient Distributed Datasets). In this post I am giving a reference of the available transformations you can use along with some examples.
Continue reading →

bamatosi

Research notes on text/data mining, software development and other stuff that attracts my geeky attention

Monthly Archives: February 2016

Transforming data in Spark