sbroad-core/src/ir/distribution.rs · ed9808c218dc9270b9673c606c2d0dee1900277d · core / picodata

5 months ago

fix: bucket calculation for duplicated columns · ed9808c2

Denis Smirnov authored 5 months ago


The queries `select * from t where sk = 1 and sk = 2` discovered
the bucket for the constant 1, rather then an empty set. The reason
was that the tuple merge transformed `sk = 1 and sk = 2` to
`(sk, sk) = (1, 2)`, while the distribution took into account only
the first position (constant 1).

To compute all keys we now take a cartesian product between all
groups of columns of a tuple, where each group consists of columns
corresponding to single column of sharding key.

Suppose tuple is (a, b, a). (a, b) refer to sharding columns, then
we have two groups:
a -> {0, 2}
b -> {1}

And the distribution keys are:
{0, 2} x {1} = {(0, 1), (2, 1)}

Co-authored-by: Arseniy Volynets <a.volynets@picodata.io>

ed9808c2

History

fix: bucket calculation for duplicated columns

Denis Smirnov authored 5 months ago


The queries `select * from t where sk = 1 and sk = 2` discovered
the bucket for the constant 1, rather then an empty set. The reason
was that the tuple merge transformed `sk = 1 and sk = 2` to
`(sk, sk) = (1, 2)`, while the distribution took into account only
the first position (constant 1).

To compute all keys we now take a cartesian product between all
groups of columns of a tuple, where each group consists of columns
corresponding to single column of sharding key.

Suppose tuple is (a, b, a). (a, b) refer to sharding columns, then
we have two groups:
a -> {0, 2}
b -> {1}

And the distribution keys are:
{0, 2} x {1} = {(0, 1), (2, 1)}

Co-authored-by: Arseniy Volynets <a.volynets@picodata.io>