Commits · fb1aef8a00fc20e74a64c5299205451e4246af15 · core / picodata

Nov 13, 2024
- BREAKING CHANGE: get rid of tracing · fb1aef8a
  Maksim Kaitmazian authored 4 months ago
  
  fb1aef8a
Nov 05, 2024

feat: mock PARTITION BY syntax · 57cb5b4c

EmirVildanov authored 4 months ago

in case such a clause met we ignore it and return an error of it being unimplemented yet

57cb5b4c

Nov 02, 2024
- feat: update test_app Makefile with rules running specific tests · c884d0b4
  EmirVildanov authored 4 months ago
  
  c884d0b4
- feat: apply lints · 4f5d2025
  EmirVildanov authored 4 months ago
  
  4f5d2025
- feat: add tests for distinct asterisk, fix unit tests · ce940442
  EmirVildanov authored 4 months ago
  
  ce940442
- feat: remove duplicate functions for cloning subtrees and checking their equality · 67bf298b
  EmirVildanov authored 4 months ago
  
  67bf298b
- fix: make expressions with references under GroupBy and parent relational... · 88a56ac2
  EmirVildanov authored 4 months ago
  
  fix: make expressions with references under GroupBy and parent relational operators be compared by Reference `targets` and `position` fields instead of Reference aliases
  88a56ac2
Oct 29, 2024

fix: incorrect equivalence classes · b2833f6a

Arseniy Volynets authored 4 months ago

- `propagate_equality` transformation did not
compute equality classes correctly, its 'merge'
function was completely wrong: it tried to add
intersection of classes to a another class,
instead of doing union
- to merge classes correctly we must do it
when we add a new pair of equal expressions:
otherwise later there will too many classes
that contain common elements, so 'merge'
function was removed and 'insert' now merges
two classes that contain common elements
- Also this logic is now covered by tests

b2833f6a

Oct 28, 2024

fix: incorrect equality cols for Eq · 29b7bd56

Arseniy Volynets authored 4 months ago

- In case we have equality between columns
of the same table, `eq_cols_from_eq` should
return empty equality cols, but it returned
None, which led to wrong motion. Fixed that.
- It was found after a merge tuples transform
fix, in front_sql_join_single_left_5 test
where now there are two equalities instead of one.
Earlier, there always was one equality because
merge tuples produced a single (..) = (..)
term for an and-chain. And this test didn't work,
because this term contained one equality pair.
Now there are two terms: (..) = (..) and (..)=(..)
and one of them does not contain any equality
pairs.

29b7bd56

fix: merge tuple transformation didn't group cols · 7fc3ff62

Arseniy Volynets authored 4 months ago

- merge tuple transformation that merges several
and-ed equalities into equlities of rows didn't
group columns by child they refer to. This led
to rows where we couldn't find sharding keys,
because they were scattered across the different
rows:

```
sk(t1) = (a, b), sk(t2) = (e, f)
... on (t1.a, t2.f) = (t2.e, t1.b)
```

But now correct rows are generated:

```
... on (t1.a, t1.b) = (t2.e, t2.f)
```

7fc3ff62

Oct 23, 2024
- feat: support IS [NOT] expression · f84401ec
  EmirVildanov authored 4 months ago
  
  f84401ec
Oct 18, 2024
- query.ebnf: remove duplicate line · 2d86c413
  Alexander Tolstoy authored 4 months ago
  
  2d86c413
Oct 17, 2024
- feat: add timeout to wait_masters_connect() call · 7c362b44
  Andrey Strochuk authored 4 months ago
  
  7c362b44
Oct 15, 2024

fix: modify WHITESPACE requirements in grammar to support failing queries · 73e6815a
EmirVildanov authored 5 months ago

73e6815a

feat: show buckets estimation in explain · 56933ca1

Arseniy Volynets authored 5 months ago and

Denis Smirnov committed 5 months ago

- Add new line in explain reporting on which buckets query will
  be executed.
- For queries consisting of a single subtree we can say exactly on
  which buckets it will be executed, for queries with more subtrees
  (with motions), we provide an upper bound of total buckets used
  in the query. Upper bound is computed by merging buckets from the
  leaf subtrees.
- In case for DML query with non-local motion we can't provide an
  upper bound, and print 'buckets: unknown'

Examples:
```
explain select a from t
->
projection ("t"."a"::integer -> "a")
    scan "t"
execution options:
    vdbe_max_steps = 45000
    vtable_max_rows = 5000
buckets = [1-3000]

explain select t.a from t join t as t2
on t.a = t2.b
->
projection ("t"."a"::integer -> "a")
  ...
execution options:
    vdbe_max_steps = 45000
    vtable_max_rows = 5000
buckets <= [1-3000]

explain select id from _pico_table
->
projection ("_pico_table"."id"::unsigned -> "id")
    scan "_pico_table"
execution options:
    vdbe_max_steps = 45000
    vtable_max_rows = 5000
buckets = any

explain insert into t values (1, 2)
->
insert "t" on conflict: fail
motion [policy: segment([ref("COLUMN_1")])]
    values
        value row (...)
execution options:
    vdbe_max_steps = 45000
    vtable_max_rows = 5000
buckets = unknown
```

56933ca1

Oct 14, 2024
- feat: support mandatory WHITESPACEs in the grammar · 7166193f
  EmirVildanov authored 5 months ago
  
  7166193f
Oct 07, 2024

fix: bucket calculation for duplicated columns · ed9808c2

Denis Smirnov authored 5 months ago


The queries `select * from t where sk = 1 and sk = 2` discovered
the bucket for the constant 1, rather then an empty set. The reason
was that the tuple merge transformed `sk = 1 and sk = 2` to
`(sk, sk) = (1, 2)`, while the distribution took into account only
the first position (constant 1).

To compute all keys we now take a cartesian product between all
groups of columns of a tuple, where each group consists of columns
corresponding to single column of sharding key.

Suppose tuple is (a, b, a). (a, b) refer to sharding columns, then
we have two groups:
a -> {0, 2}
b -> {1}

And the distribution keys are:
{0, 2} x {1} = {(0, 1), (2, 1)}

Co-authored-by: Arseniy Volynets <a.volynets@picodata.io>

ed9808c2

fix: bucket discovery for local motions · 8507c789

Denis Smirnov authored 5 months ago


Previously, during bucket discovery, we used Buckets::Any for
Motion(Local) nodes. This caused DML queries to be executed on all
nodes instead of targeting specific bucket children. We now apply
Motion(Local) only in the following cases:

- update/delete. When materializing the reading subtree for DML
  operations, Buckets::Any was used, but the reason for this is
  unclear.
- union all between sharded and local tables. To prevent duplicates,
  we materialize the global subtree only on a single storage node.
  Consequently, the subtree with Motion(Local) must have the same
  buckets as its child (the child node will always have Buckets::Any).

Co-authored-by: Arseniy Volynets <a.volynets@picodata.io>

8507c789

Oct 04, 2024

fix: wrong slices calculation · d5aa241e

Arseniy Volynets authored 5 months ago

- Previously we computed slices based on their
level in bfs tree traversal. This was wrong, as
motions that were independent could be in
different slices
- Fix that, now we the slice in which motion will
be is the max number of other motion nodes in path
from this motion to any leaf node

d5aa241e

Oct 03, 2024

refactor: remove ExecuteOptions · 093f271f

Arseniy Volynets authored 5 months ago

- ExecuteOptions is a hashmap that always
stores only 1 entry. Moreover this led to
a bug, when we expected that it stores ALL
execution options, while it stored only
vdbe_max_steps

093f271f

fix: vtable max rows limit not applied · b6574768

Arseniy Volynets authored 5 months ago

- We didn't apply vtable max rows value
when executing local sql. We tried to lookup
it in the execute options hashmap and took
the default value instead, though it was
not stored in the hashmap.

b6574768

Oct 02, 2024
- feat: introduce WAIT APPLIED (GLOBALLY | LOCALLY ) options for DDL operations · 18f4db0d
  Maksim Kaitmazian authored 6 months ago and Maksim Kaitmazian committed 5 months ago
  
  18f4db0d
- fix: use Buckets::Any for Values · 0a5a4902
  Arseniy Volynets authored 5 months ago
  
  - Earlier we returned empty buckets for Values/ValuesRow nodes and then choosed random bucket for execution if they were a subtree root. - Buckets::Any provides the same semantics, use it instead and simplify `empty_query_result` function, which now always returns empty result.
  0a5a4902
- fixup: EBNF for documentation · c4848ec4
  Denis Smirnov authored 5 months ago
  
  c4848ec4
Oct 01, 2024
- feat: introduce IF EXISTS and IF NOT EXISTS options · 79d1d796
  Maksim Kaitmazian authored 6 months ago and Maksim Kaitmazian committed 5 months ago
  
  79d1d796
- feat: indent execution options in explain · 497d6b02
  Arseniy Volynets authored 5 months ago
  
  497d6b02
Sep 30, 2024
- feat: change default auth method to md5 · 4e5d87d8
  Вартан Бабаян authored 6 months ago
  
  4e5d87d8
- fix: gitlab MR pattern · f02f1627
  Denis Smirnov authored 5 months ago
  
  f02f1627
- fix: used to panic on different values rows length · 2139dc67
  Arseniy Volynets authored 5 months ago and Denis Smirnov committed 5 months ago
  
  - When creating Values node in IR we didn't check that all values rows have the same length. - This led to panic on earlier pipeline stages: syntax plan build - Add a check that all values rows have the same length
  2139dc67
Sep 27, 2024

fix: used to error on concat with parameters · 6687b5c9

Arseniy Volynets authored 5 months ago

- When adding concat to plan, we had a check
that both concat children are expressions.
- This leads to error, when children are
parameters.
- Replace error with debug assert that checks
for expression | parameter children. Debug assert
is used because it is quiet unlikely that we will
mess up expression children, no need to check it
in release builds.

6687b5c9

feat: support select without scan · 97d2b814

Arseniy Volynets authored 5 months ago

- Support simple queries that do not select
data from tables. This is similar to `values ..`,
but allows to add aliases for expressions.
- Such selects can also be used in subqueries
- Examples:
```
select 1;
select (select count(*) from t);
select 2 as foo, 3 as bar;

select a from t
where b in (select 100)
```

97d2b814

feat: backend sql wraps exprs with parentheses · c389ed4a
Andrey Strochuk authored 5 months ago

c389ed4a

fix: wrong hash calculation of plan subtree · 277d570e

Arseniy Volynets authored 5 months ago

- We didn't traverse output during subtree
traversal when calculating hash. Some nodes
(Motion, Projection) store non-trivial
information, which allows to distinguish
different plans. We came across this by
collision between two different queries:
```
SELECT w.n FROM t JOIN w ON t.n = w.n
LIMIT 3

SELECT id, count(*) FROM t
GROUP BY id
HAVING id > 2
LIMIT 3
```
These plan happened to have subtrees that match
exactly except for output of projection, that
we didn't traverse.

- fix this by traversing subtree fully

277d570e

Sep 26, 2024

fix: node mapped twice · 07574e00

Arseniy Volynets authored 5 months ago

- ValuesRow node have `data` and `output`
fields refer to the same stuff in query
`values (1)`. For except between global
and sharded table we applied special
transformation, in which we cloned
global subtree.
- Subtree cloner didn't respect DAG structure
of our plan: there maybe multiple references
to the same node. Instead it expected a tree.
Fix that

07574e00

fix: error on union under insert · 5480dfbf
Arseniy Volynets authored 5 months ago

5480dfbf

Sep 24, 2024
- test: create table with tier syntax · b385c8c3
  Вартан Бабаян authored 5 months ago
  
  b385c8c3
- fix: drop plugin parsing · 7c708e63
  Denis Smirnov authored 5 months ago
  
  7c708e63
- refactor: rename sql_vdbe_max_steps to vdbe_max_steps · 6dcdab76
  Arseniy Volynets authored 6 months ago
  
  6dcdab76
Sep 23, 2024
- feat: support ilike operator · cf127608
  Arseniy Volynets authored 5 months ago
  
  cf127608
- feat: export lower/upper string functions · 93dc653d
  Arseniy Volynets authored 6 months ago
  
  93dc653d