April 2015 – Garren's [Big] Data Blog

Split file by keys

Posted by Garren on 2015/04/02

Files sometimes come in (whether via hadoop or other processes) as big globs of data with inter-related parts. Many times I want to process these globs concurrently but see my dilemma unfolding quickly. I could a) write the code to process it serially and be done with it in 1 hour or b) write code… Continue reading→

Default Leave a Comment