Skip to content

Vshard buckets with "receiving" status can cause data inconsistency in results

Run a two instance cluster of picodata. On the first instance:

pico.sql([[create table t (a int, b int, primary key(a)) distributed by(b)]])
pico.sql([[insert into t values (1, 1)]])
pico.sql([[select "bucket_id", * from t]])
---
- metadata:
  - {'name': 'bucket_id', 'type': 'unsigned'}
  - {'name': 'A', 'type': 'integer'}
  - {'name': 'B', 'type': 'integer'}
  rows:
  - [1933, 1, 1]

box.space.T.id
---
- 1026
...
box.space._bucket:fselect(1933)
---
- - +-----+--------+-----------+
  - ​| id  | status|destination|
  - +-----+--------+-----------+
  - ​|1933 |"active"|          |
  - +-----+--------+-----------+
...

Emulate bucket rebalancing (receiving stage on the other replicaset):

vshard.storage.bucket_recv(1933, "e0df68c5-e7f9-395f-86b3-30ad9e1b7b07", {{1026, {{1, 1933, 1}}}}
box.space._bucket:fselect(1933)
---
- - ​+-----+-----------+--------------------------------------+
  - ​| id  |  status   |             destination              |
  - ​+-----+-----------+--------------------------------------+
  - ​|1933 |"receiving"|"e0df68c5-e7f9-395f-86b3-30ad9e1b7b07"|
  - ​+-----+-----------+--------------------------------------+
...
box.space.T:fselect()
---
- - ​+-----+---------+-----+
  - ​|  A  |bucket_id|  B  |
  - ​+-----+---------+-----+
  - ​|  1  |  1933   |  1  |
  - ​+-----+---------+-----+
...

Now we have duplicated tuples (1, 1) on both replicasets that confuse pico.sql:

pico.sql([[select * from t]])
---
- metadata:
  - {'name': 'A', 'type': 'integer'}
  - {'name': 'B', 'type': 'integer'}
  rows:
  - [1, 1]
  - [1, 1]
...