Provide examples for optimized data shuffling

It should be more performant to group data by sharding key, so every batch is shard-local. We shall provide some examples on implementing such shuffling on Spark side