Commits · 0687e96fc983199fb513ef5145199e69af398859 · core / vshard

Nov 27, 2024
- feat: add ability to specify id of space _bucket · 0687e96f
  Georgy Moshkin authored 4 months ago
  
  0687e96f
Oct 18, 2022

error: fix conflicting member names · b47b6c29

George Vaintrub authored 2 years ago

This patch solves the problem with conflicting
arguments in the error module.

The arguments in the 'ROUTER_ALREADY_EXISTS'
and 'ROUTER_CFG_IS_IN_PROGRESS' errors have been
renamed to 'router_name'. The 'name' field in the
error object is already in use.

NO_DOC=undocumented behavior

b47b6c29

Aug 24, 2022

ci: add linters · c508c822

Nikita Zheleztsov authored 2 years ago

This patch introduces running luacheck and checkpatch on CI.

It uses the lastest version of luacheck available on luarocks.
We should add ignoring of warning in 'version_test' file as this
version induces us to replace all comparison signs to be
positive (without using 'not' boolean expression). Moreover, due
to the recent changes mutating of the ivshard global variable was
added, which should also be ignored.

Checkpatch should ignore NO_CHANGELOG and run only on PR in order
not to make CI red when pushing directly to the master branch.

Closes #369

NO_DOC=ci
NO_TEST=ci

c508c822

ci: replace debug tarantool's master with release · e673b723

Nikita Zheleztsov authored 2 years ago

All integration tests in main tarantool's repository run on
RelWithDebInfo version. However, we use Debug in vshard's CI,
which increases time, needed for tests and building, which leads
to inconsistency, as all others checks use release versions.

Moreover, in order to disable tests on vshard's CI we should disable
them on both release and debug versions. But all current tests are
disabled only on release versions.

Let's test vshard's patches using release build of tarantool's master.

Part of #369

NO_DOC=ci
NO_TEST=ci

e673b723

ci: drop testing on tarantool 2.9 · 83d9c2af

Nikita Zheleztsov authored 2 years ago

Currently tests run on the 2.9. However, this version of tarantool
was never released, this is the name of the intermediate versions
between 2.8.1 and 2.10.0.

Let's drop 2.9 version from testing matrix.

Part of #369

NO_DOC=ci
NO_TEST=ci

83d9c2af

Aug 22, 2022

test: fix flaky router/router · bcf87e27

Nikita Zheleztsov authored 2 years ago

The part of the test router/router inserts a tuple to the space
of the master, invokes 'vshard.storage.sync' on the same master
and executes 'vshard.router.callro', which routes the request to
the replica of the corresponding replicaset.

The problem was that sometimes this replica didn't have needed
tuple and returned null, which wasn't the value the test expected.
It was caused by the fact, that sometimes replication didn't have
time to happen and 'vshard.storage.sync' didn't check vclocks of
all replicas in a replicaset.

The purpose of the 'vshard.storage.sync()' is to wait until the
dataset is successfully synchronized on all replicas. The work of
this function relies on the downstreams from 'box.info.replication'.
These fields are not 'nil' only if the instance knows about the
other ones, which follow her. In the previous implementation of the
'wait_lsn()', which is used in 'sync()', the function returned true
even if there was no 'downstreams' at all, which means that it used
to said that the data is fully synchronized, without checking
vclocks of all replicas.

The fix is to make the 'wait_lsn()' function return true only if
there's downstreams to all replicas and vclock of the current replica
is less or equal than any of replicas vclocks.

Closes #366

bcf87e27

router: prohibit simultaneous configuration · b6b95d0e

Nikita Zheleztsov authored 2 years ago

Currently router's configuration can be executed concurrently from
several fibers, which can lead to non-obvious errors.

Let's prohibit such behavior and throw an error in case of trying
to run `vshard.router.cfg()` or `router:cfg()` while the other
configuration of the corresponding router is still in progress.

Closes #140

b6b95d0e

storage: prohibit simultaneous configuration · 92694575

Nikita Zheleztsov authored 2 years ago

Currently `vshard.storage.cfg()` can be executed concurrently from
several fibers, which can lead to non-obvious errors.

Let's prohibit such behavior and throw an error in case of trying
to run `vshard.storage.cfg()` while the other is still in progress.

Part of #140

92694575

test: fix dropping the server in router_test.lua · a25c50b7

Nikita Zheleztsov authored 2 years ago

Currently `router-luatest/router_test.lua doesn't allow to create
servers named `router_1` as it wasn't deleted correctly.

The server should not only be dropped but also deleted from cluster's
table. The patch introduces function for deleting instance properly and
fixes `router-luatest/router_test.lua`.

a25c50b7

Aug 19, 2022
- doc: create 0.1.21 changelog · 5cb27834
  Vladislav Shpilevoy authored 2 years ago
  
  View commits for tag 0.1.21 0.1.21
  
  5cb27834
Aug 16, 2022

ci: disable router/router.test.lua · 560b397e

Nikita Zheleztsov authored 2 years ago

Tarantool's CI fails constantly on router/router.test.lua.
Let's disable this test before the fix is commited.

560b397e

Aug 15, 2022

replicaset: reconnect after fiber kill · 2500190d

Nikita Zheleztsov authored 2 years ago

Currently if we kill the worker fiber of the connection, which was
initialized with 'reconnect_after' option, this connection goes into
'error_reconnect' or 'error' state (depends on tarantool version).
Reconnecting doesn't happen in both cases and the only way for user
to return router to working order is reloading or manual restoring of
the connections.

This patch introduces reconnecting in that case. It should be used
wisely, though. Fiber's killing doesn't happen instantly and if the
user doesn't wait util fiber's status is 'dead' and makes the request
immediately, exception will be probably thrown as the fiber can die
in the middle of request.

Closes #341

2500190d

test: fix hang in storage/recovery.test · 8b5dd90c

Nikita Zheleztsov authored 2 years ago

The problem is that recovery fiber wakes up earlier than we want it
to do so. This leads to the test output which we don't expect.

Let's block recovery fiber before making any changes to the `_bucket`.
It'll start again as soon as the instance is restarted.

Needed for #341

8b5dd90c

test: decrease wait_timeout · 5f2fc91b

Nikita Zheleztsov authored 2 years ago

Currently `wait_timeout` is set to 120 seconds, which is also a default
value for luatest. The problem is the fact, that any timeout loop, if it
hangs, will be executed until the process won't be forcefully killed by
luatest. In such case result file is not created and the programmer
can't even see, what happened.

Let's set `wait_timeout` to 50 seconds, which seems to be pretty enough
for any test to be completed.

5f2fc91b

Aug 09, 2022

gc: wait replication before checking SENT buckets · afe764ca

Vladislav Shpilevoy authored 2 years ago

GC goes to replicas to check if a SENT bucket can be deleted. But
it can happen that this call via netbox was faster than the
replication. Some replicas still can have the bucket SENDING.

That triggered 5 seconds GC backoff. That was not a problem for
GC, but it was for map calls. Router's map_callrw() had to wait
until GC woke up after the backoff and finally deleted the bucket.

All that time the pending map_callrw() was not only just waiting,
but also disrupting rebalancing. Because the "bucket-move vs
storage-ref" scheduler tries to give time both to moves and refs
fairly.

The patch makes bucket GC synchronize with replicas before
checking their buckets. It is done per-batch. Doing it just once
in the beginning of _bucket space iteration wouldn't be enough,
because new SENT buckets not covered by that sync would keep
appearing (if the rebalancing is still ongoing).

Part of #173

afe764ca

gc: do not check local buckets via netbox · f22993d2

Vladislav Shpilevoy authored 2 years ago

Bucket GC makes a map request on all nodes in the replicaset to
check which buckets are eligible for turning into GARBAGE.

Going to the own local instance via netbox of course makes no
sense. The buckets are checked locally both before and after the
map request anyway.

The patch optimizes GC so that it won't visit self via netbox
during the map call.

Part of #173

f22993d2

gc: protect bucket refs on replicas from GC · 96588fa9

Vladislav Shpilevoy authored 2 years ago

There is a bug about master instance's bucket GC fiber not
respecting RO refs on replicas. The idea of a fix - make the
bucket GC procedure consult each replica before marking a bucket
as GARBAGE.

That allows not to affect the requests coming to replicas at all.
They can keep looking just at the bucket status and not consult
the master instance before doing an RO ref. All the hard work is
done in background and only after some buckets were actually sent.

GC of a sent bucket works now by the plan:
- ACTIVE bucket becomes SENT;
- SENT status is delivered to replicas via plain replication;
- The replicas stop accepting new RO requests to the SENT bucket;
- Master periodically asks replicas if they still have RO refs on
  the SENT bucket;
- Eventually none of the replicas have RO refs on the bucket;
- Master can safely mark the bucket as GARBAGE and delete it
  being sure that no existing nor new requests could access it
  now.

However this is not all. While it improves the protection, still
data inconsistency is reachable. Can be achieved via replication
and/or configuration problems. One scenario:
- Node1 is a master, node2 is a replica, bucket1 is ACTIVE on
  both;
- Node1 sends bucket1 to another replicaset;
- The bucket becomes SENT on both nodes;
- Node1 and node2 loose replication. But netbox still works;
- Node1 and node2 drop each other from config and from
  replication;
- Each of them becomes master and deletes the SENT bucket.
- Node2 receives the bucket from another replicaset again. Now it
  is ACTIVE here and not present on node1 at all. On node2 it is
  serving an RW or an RO request right now.
- Node1 and node2 again get their old config and restore the
  replication.
- Node2 receives removal of the bucket from node1.

In this scenario node2 shouldn't apply the bucket removal.
Firstly, it should always follow the path ACTIVE -> SENT ->
GARBAGE. Secondly, its removal would make the currently running
requests access inconsistent data.

This should be fixed separately. Most likely by addition of sanity
checks to the _bucket:on_replace trigger to raise an error when
there is a threat of data loss or corruption.

Part of #173

96588fa9

test: fix flaky reroute_wrong_bucket · 743b607f

Vladislav Shpilevoy authored 2 years ago

It relied on the bucket being deleted by the time a router call
reaches the storage. But if GC is not so fast, the bucket still
can be in SENT or GARBAGE state. Router still retries the call,
but in the end returns a slightly different error.

The patch makes the test ignore irrelevant error fields.

743b607f

test: fix flaky upgrade/upgrade · e063d82b

Vladislav Shpilevoy authored 2 years ago

The replica (storage_1_b) sometimes didn't have time to receive
the schema upgrade from the master (storage_1_a). The fix is to
wait for it explicitly.

Closes #338

e063d82b

test: fix flaky storage/recovery · 98065ae2

Vladislav Shpilevoy authored 2 years ago

In one place the test assumed that after recovery_wakeup() + yield
the recovery is fully done. It is not so even now and gets worse
after following commits.

The fix is to wait for the needed state of _bucket space instead
of assuming that it is immediately reachable.

98065ae2

replicaset: introduce map_call() · 6e246250

Vladislav Shpilevoy authored 2 years ago

There is a bug about master instance's bucket GC fiber not
respecting RO refs on replicas. The idea of a fix - make the
bucket GC procedure consult each replica before marking a bucket
as GARBAGE. Why this way - see motivation in the main commit.

Consulting each replica means map-reduce. The present commit
introduces replicaset:map_call(). It is like router.map_callrw(),
but works on all replicas of a single replicaset instead of
masters of all replicasets.

It is going to be used in the main commit.

Needed for #173

6e246250

vtest: rename most of storage_ funcs to cluster_ · a1726b67

Vladislav Shpilevoy authored 2 years ago

Functions like storage_new(), storage_cfg(), storage_for_each()
and others operated on all storage nodes in the cluster. The
naming was fine since individual storages didn't have any methods
except storage_first_bucket(). But it is going to change in next
commits. And then the naming would be confusing - when would use
vtest.storate_start(), would it start the entire cluster or a
single storage node?

The patch renames most of storage_ functions to have cluster_
prefix. Those which always operated on entire cluster. It wasn't
done in the beginning because routers might seem like a part of
the cluster. But the new assumption of 'cluster' meaning only
storages looks less confusing than 'storage' meaning all the data
nodes.

Needed for #173

a1726b67

gc: introduce service_call.bucket_test_gc() · 1d566d12

Vladislav Shpilevoy authored 2 years ago

There is a bug about master instance's bucket GC fiber not
respecting RO refs on replicas. The idea of a fix - make the
bucket GC procedure consult replicas before marking a bucket as
GARBAGE. Why this way - see motivation in the main commit.

The present commit introduces a helper - service call
bucket_test_gc(). The function takes bucket IDs and returns which
of them are not ok to be GCed.

The function returns "not ok" bids instead of "ok" ones. Because
most of the tested buckets are supposed to be collectible. Sending
"not ok" bids would produce less network traffic. And is easier to
process in a map-reduce way. Indeed, easier to mark all bids which
are not ok on at least one replica, then for each bid do count how
many replicas are ok with it being removed. Is clearly visible in
the main commit later.

Needed for #173

1d566d12

test: fix flaky test on unstable mvcc · 9e9d7923

Vladislav Shpilevoy authored 2 years ago

In core Tarantool there is a bug in mvcc which sometimes makes
space:count() ~= #space:select(). That works fine now, but an
existing test breaks on that after one of the next commits.
Including an already released Tarantool version 2.10.0. It means
the test must bypass this bug somehow and live with that as long
as vshard supports 2.10.0 at all.

Needed for #173

9e9d7923

Jul 27, 2022

test: fix flaky router/router2.test.lua · ce4c0a00

Nikita Zheleztsov authored 2 years ago

The part of the router/router2 tests changing discovery mode to
different states. During reconfiguration, which is invoked after
every of these changes, previos discovery fiber is killed and new
one is created.

The problem is that we check the statuses of the fibers without
any waiting and sometimes there was no time for fiber to be killed.
This happens due to the fact that fiber's cancellation is performed
asynchronously: it's not instant.

Let's wait until the status of the last tested fiber is not dead.
By this point all other fibers will already be dead anyway.
Moreover, the last fiber, which is executed only once, can be
already dead by this time. Let's check if it's done or still working.

Closes #358

ce4c0a00

Jul 18, 2022

ci: add a debug tarantool version · d7dab3e3

Nikita Zheleztsov authored 2 years ago

This patch inroduce CI testing on the latest git version of
Tarantool, which is built as `Debug`.

By default master branch is used. However, any other one can
be used too: just change everything after `debug-` to another
name.

Part of #339

d7dab3e3

Jul 11, 2022

test: limit max count of tuples in stress test · 96b6d279

Vladislav Shpilevoy authored 2 years ago

Rebalancer stress tests potentially could generate infinite number
of tuples. The tests should have limits on all resources that they
use.

The patch sets the limit to a large value, but it is at least not
infinite.

Follow up #309

96b6d279

test: fix flaky rebalancer/stress · 68d564e5

Vladislav Shpilevoy authored 2 years ago

The tests stress_add_remove_several_rs and stress_add_remove_rs
try to change cluster topology during rebalancing to check if
the rebalancing eventually ends without errors.

Moreover, during doing that they also generate some RW load via a
router.

The RW load was generated in a fiber which in the end of a test
case is canceled, then the test ensures that all written data is
available for reading.

The problem is that the RW generator fiber had no fiber cancel
checks. It only called `vshard.router.call()` in a loop until it
succeeds. The function doesn't raise exceptions, so if the fiber
was canceled before the `call()` went through, the call was
constantly failing.

What is worse, requests were failing, but they still could get to
the network. They were appended to the netbox's internal buffer,
but the fiber couldn't wait for a response.

Here is how these orphan requests affected the tests. The tests
did the RW load at least 2 times. If a fiber was canceled wrong on
a first load, then on a second load it messed up with the new
requests by keeping retries of the canceled request. That could,
for example, lead to a duplicate PK error in the `test` space.
Then the rebalancer wouldn't be able to finish - duplicate pks in
2 buckets would prevent their storage on the same instance.

Besides, this canceled fibers constantly spamed about that
clogging the logs.

Closes #309

68d564e5

Jul 08, 2022

router: auto and manual enable/disable · dd70cfb2

Nikita Zheleztsov authored 2 years ago

This patch introduces protecting router's API while its configuration
is not done as accessing these functions at that time is not safe and
can cause low level errors like 'bad arguments' or 'no such function'.

Now all non-trivial vshard.router functions are disabled until
`vshard.router.cfg` (or `vshard.router.new`) is finished and error is
raised in case of an attempt to access them.

Manual API's enabling/disabling is also introduced in this patch.

Closes #194
Closes #291

@TarantoolBot document
Title: vshard.router.enable/disable()
`vshard.router.disable()` makes most of the `vshard.router`
functions throw an error. As Lua exception, not via `nil, err`
pattern.

`vshard.router.enable()` reverts the disable.

`router_object:enable()/disable()`, where `router_object` is
the return value of `vshard.router.new()`, can also be used
for manual API access configuration for the specific non-static
router.

By default the router is enabled.

Additionally, the router is forcefully disabled automatically
until its configuration is finished and the instance finished
recovery (its `box.info.status` is `'running'`, for example).

Auto-disable protects from usage of vshard functions before the
router's global state is fully created.

Manual router's disabling helps to achieve the same for user's
application. For instance, a user might want to do some preparatory
work after `vshard.router.cfg` before the application is ready.
Then the flow would be:
```Lua
vshard.router.disable()
vshard.router.cfg(...)
-- Do your preparatory work here ...
vshard.router.enable()
```

The behavior of the router's API enabling/disabling is similar to the
storage's one.

dd70cfb2

Jul 07, 2022

test: port bucket GC test to luatest · 20263e05

Vladislav Shpilevoy authored 2 years ago

Garbage collection is going to be significantly reworked in order
to prevent deletion of SENT buckets which still can have RO refs
on replicas.

It means more GC tests will be needed and the old ones will need
to change. The patch prepares the existing tests to that by
porting them to luatest.

Part of #173

20263e05

storage: unify index:min() behaviour · b07d24cb

Vladislav Shpilevoy authored 2 years ago

At < 2.1.0 index:min(key) worked as

    index:select(key, {limit = 1, iterator='GE'})

At >= 2.1.0 it started working as:

    index:select(key, {limit = 1})

The new behaviour is good and correct. But still vshard needs to
function on 1.10 too. The patch introduces vshard.util.index_min()
which behaves like index:min() at >= 2.1.0. And .index_has()
helper to check if a key exists in an index, which is usually the
purpose of :min().

Follow up tarantool/tarantool#3167
Needed for #173

b07d24cb

recovery: turn SENDING into SENT, not GARBAGE · e35e38fb

Vladislav Shpilevoy authored 2 years ago

Previously SENT and GARBAGE statuses were in practice the same.
Both could be deleted as soon as have no RO refs (except that it
didn't work on replicas).

But soon it is going to change. GARBAGE would be assigned to a
bucket only if it already has no any refs in the entire
replicaset and won't be able to get new ones.

SENT would mean the bucket still can have at least RO refs and
needs validation whether it has them on any replica in its
replicaset.

The patch makes bucket recovery turn SENDING bucket into SENT in
case it is activated in its destination replicaset. The garbage
collector then is trusted to deal with SENT buckets in a special
way. To be done in future patches.

Part of #173

e35e38fb

Jul 06, 2022

test: fix flaky router/reconnect_to_master · 5edceebe

Nikita Zheleztsov authored 2 years ago

Counting known buckets on router at first time in this test was not
stable as sometimes discovery_fiber didn't have time to start discovery
process.

Let's start router with disabled discovery, make sure the router doesn't
know any buckets, enable discovery mode and wait until it gets all
buckets from replica.

Closes #304

5edceebe

Jun 27, 2022

ci: include 2.10 · 969005a5
Vladislav Shpilevoy authored 2 years ago
```
Part of #339
```
969005a5

storage: fail a call if couldn't unref the bucket · a1f1ca36

Vladislav Shpilevoy authored 2 years ago

vshard.storage.call() takes care of bucket referencing to prevent
its move while the call is in progress. It assumed that if a ref
succeeded, then an unref would also work.

But it is not always true. Now it is quite easy to break, because
if a ref is on a replica, then a master could move the bucket even
while it is being accessed on another instance.

The patch makes so vshard.storage.call() fails if unref fails,
signaling that the bucket was probably deleted together with its
ref counter.

In future patches there are going to be more fail-safe measures
like this one. But for some of them will firstly need to fix
bucket GC so as it would consult replicas if they still have refs.

Part of #173

a1f1ca36

error: support stacked box errors · 3cd2aaed

Vladislav Shpilevoy authored 2 years ago

Box errors can be stacked via 'prev' member. Such errors didn't
raise any exceptions in vshard, but vshard.error.box() wouldn't
unpack any errors beyond the first one.

The patch makes so vshard.error.box() walks the whole error stack.

It is done to be consistent with a future patch which will use
'prev' field for vshard errors too, thus producing stacks of
errors as Lua tables.

That in turn is needed in order to return more than one error from
vshard.storage.call() which will be able to fail both user
function and bucket unref.

Needed for #173

3cd2aaed

storage: make bucket_unref fail critically · 0c5d840f

Vladislav Shpilevoy authored 2 years ago

It used to fail with WRONG_BUCKET error. That isn't a good idea,
because if the error ever reaches the router, it will retry the
request.

It can't happen now, but in a future patch it will be possible.
Then it could be that ref succeeded, request started execution,
then bucket was deleted, and unref failed. The last step shouldn't
allow the router to retry the request. Especially if it was an RW
request which is not idempotent.

Part of #173

0c5d840f

storage: handle _bucket changes in commit triggers · 896bce78

Vladislav Shpilevoy authored 2 years ago

Previously _bucket changes only triggered wakeup of various
background fibers. They didn't validate anything nor changed.

While the patch doesn't do much on the field of validation (but it
is a part of the future patches), it makes _bucket updates do some
changes on commit. Without need to do that after an explicit
transaction like

    _bucket:replace(...)
    bucket_ref.ro_lock = true

Now the replace itself on commit tries to do the right things.

This is important because otherwise replicas don't get any updates
to bucket_ref objects. For example, if a bucket was ACTIVE and
became SENT/GARBAGE, and on the replica it had ro refs > 0, then
on the replica it didn't install `ro_lock` to reject new ro refs.
So new read requests could come and would be accepted.

Now also when a bucket is deleted or becomes garbage, its ref
descriptor is deleted. That fixes a similar problem when an
ACTIVE bucket is just deleted completely.

There are also more updates which handle cases impossible during
normal functioning, but still could be done manually or happen as
a bug. For example, it shouldn't be possible that a RECEIVING
bucket has a ref descriptor at all. Yet if it is detected, the
ref is deleted.

The patch intentionally doesn't install rw_lock on SENDING bucket,
because it is supposed to be installed *before* the bucket becomes
SENDING. Checks like that are a subject of a future patch which
will make _bucket on_replace trigger do sanity checks before
commit happens.

The patch is a part of big work on making _bucket updates more
robust and make replicas aware of and agree with them.

Part of #173

896bce78

vtest: store wait_timeout in a single place · 2c28be75

Vladislav Shpilevoy authored 2 years ago

VShard luatests used to define wait_timeout in their test.lua
file. That led to the timeout being duplicated in all the files
and to a bit clumsy code when the timeout needed to be passed to
the instances.

The patch makes so the timeout is stored in a single place - in
vtest. It is a default timeout for existing waiting functions and
will be used in the future patches too.

Needed for #173

2c28be75

storage: change ERRINJ_NO_RECOVERY behaviour · ba55c69d

Vladislav Shpilevoy authored 2 years ago

In new tests it will be necessary to pause the recovery again. But
the old implementation of the injection didn't allow to find out
whether the recovery has already got stuck on it. The new one
makes it possible to wait until the recovery is paused by checking
the injection value.

The name is changed to ERRINJ_RECOVERY_PAUSE to reflect the
behaviour better. A similar injection will be introduced for GC.

Needed for #173

ba55c69d