fix: buckets resolving bugs
Summarize the changes
- fix: buckets resolving bugs
- For query
select * from t sk = 1 and sk = 2
we failed to resolve empty buckets. The problem was that we didn't compute distribution of (sk, sk) tuple correctly: it has two keys on first column and on the second column. But we alway computed only single key.
To compute all keys we now take a cartesian product between all groups of columns of a tuple, where each group consists of columns corresponding to single column of sharding key.
Suppose tuple is (a, b, a). a,b refer to sharding columns, then we have two groups: a -> {0, 2} b -> {1} And the distribution keys: {0, 2} x {1} = {(0, 1), (2, 1)}
- Another bug was that in bucket discovery we used Buckets::Any for Motion(Local) node. This led to dml queries executing on all nodes instead of buckets children. Currently we use Motion(Local) for the following cases:
- Update/Delete: materialize reading subtree and do dml operation, no idea why this required Buckets::Any
- UnionAll between sharded and local tables, here we materialize global subtree only on a single storage to avoid duplicates. So, the subtree with motion should have the same buckets as child (child always will have Buckets::Any).
Ensure that
-
New code is covered by unit and integration tests. -
Related issues would be automatically closed with gitlab's closing pattern ( Closes #issue_number
). -
Public modules are documented (check the rendered version withcargo doc --open
). -
(if PEST grammar has changed) EBNF grammar reflects these changes (check the result with railroad diagram generator.
close #843 (closed)
Next steps
- Cherry-pick to: none
- Update sbroad submodule in picodata/picodata.
- (if EBNF grammar has changed) create a follow-up issue in picodata/docs.