fix: bucket calculation for duplicated columns
The queries `select * from t where sk = 1 and sk = 2` discovered
the bucket for the constant 1, rather then an empty set. The reason
was that the tuple merge transformed `sk = 1 and sk = 2` to
`(sk, sk) = (1, 2)`, while the distribution took into account only
the first position (constant 1).
To compute all keys we now take a cartesian product between all
groups of columns of a tuple, where each group consists of columns
corresponding to single column of sharding key.
Suppose tuple is (a, b, a). (a, b) refer to sharding columns, then
we have two groups:
a -> {0, 2}
b -> {1}
And the distribution keys are:
{0, 2} x {1} = {(0, 1), (2, 1)}
Co-authored-by:
Arseniy Volynets <a.volynets@picodata.io>
Showing
- sbroad-core/src/executor/bucket/tests.rs 26 additions, 0 deletionssbroad-core/src/executor/bucket/tests.rs
- sbroad-core/src/frontend/sql/ir/tests/insert.rs 1 addition, 1 deletionsbroad-core/src/frontend/sql/ir/tests/insert.rs
- sbroad-core/src/ir/distribution.rs 29 additions, 15 deletionssbroad-core/src/ir/distribution.rs
Loading
Please register or sign in to comment