A handful of GoDataDriven posts
In the past few weeks I’ve written a couple of blog posts at the GoDataDriven blog:
-
Convert chararray user ID’s to integers with pig: here I explain how to use pig’s
RANK
function; -
Merge Mahout output from different algorithms: in this post I lay out a possible strategy to merge the output of different algorithms of Mahout item based collaborative filtering;
-
Last but not least, the performance impact of vectorization post covers how NumPy can efficiently compute operations on arrays.