From fd6ee6d5b8bd5cbcb81f4a650c37ea5d5fac2a16 Mon Sep 17 00:00:00 2001 From: "Alexander V. Tikhonov" <avtikhon@tarantool.org> Date: Tue, 23 Mar 2021 09:37:55 +0300 Subject: [PATCH] github-ci: switch off swap use Github Actions provides hosts for Linux base runners in the following configurations: 2 Cores 7 Gb memory 4 Gb swap memory To avoid of issues with hanging/slowing tests on high memory use like [1], hosts configurations must avoid of swap memory use. All of the tests workflows run inside dockers containers. This patch sets in docker run configurations memory limits based on current github actions hosts - 7Gb memory w/o swap memory increase. Checked 10 full runs (29 workflows in each run used the change) and got single failed test on gevent() routine in test-run. This result much better than w/o this patch when 3-4 of workflows fail on each full run. It could happen because swappiness set to default value: cat /sys/fs/cgroup/memory/memory.swappiness 60 From documentation on swappiness [2]: This control is used to define the rough relative IO cost of swapping and filesystem paging, as a value between 0 and 200. At 100, the VM assumes equal IO cost and will thus apply memory pressure to the page cache and swap-backed pages equally; lower values signify more expensive swap IO, higher values indicates cheaper. Keep in mind that filesystem IO patterns under memory pressure tend to be more efficient than swap's random IO. An optimal value will require experimentation and will also be workload-dependent. We may try to tune how often anonymous pages are swapped using the swappiness parameter, but our goal is to stabilize timings (and make them as predictable as possible), so the best option is to disable swap at all and work on descreasing memory consumption for huge tests. For Github Actions host configurations with 7Gb RAM it means that after 2.8Gb RAM was used swap began to use. But in testing we have some tests that use 2.5Gb of RAM like 'box/net_msg_max.test.lua' and memory fragmentation could cause after the test run swap use [3]. Also found that disk cache could use some RAM and it also was the cause of fast memory use and start swapping. It can be periodically dropped from memory [4] using 'drop_cache' system value setup, but it won't fix the overall issue with swap use. After freed cached pages in RAM another system kernel option can be tuned [5][6] 'vfs_cache_pressure'. This percentage value controls the tendency of the kernel to reclaim the memory which is used for caching of directory and inode objects. Increasing it significantly beyond default value of 100 may have negative performance impact. Reclaim code needs to take various locks to find freeable directory and inode objects. With 'vfs_cache_pressure=1000', it will look for ten times more freeable objects than there are. This patch won't do this change, but it can be done as the next change. To fix the issue there were made changes: - For jobs that run tests and use actions/environment and don't use Github Actions container tag, it was set 'sudo swapoff -a' command in actions/environment action. - For jobs that run tests and use Github Actions container tag the previous solution doesn't work. It was decided to hard-code the memory value based on found on Github Actions hosts memory size 7Gb. It was set for Github container tag as additional options: options: '--init --memory=7G --memory-swap=7G' This changes were made temporary till these containers tags will be removed within resolving tarantool/tarantool-qa#101 issue for workflows: debug_coverage release release_asan_clang11 release_clang release_lto release_lto_clang11 static_build static_build_cmake_linux - For VMware VMs like with FreeBSD added 'sudo swapoff -a' command before build commands. - For OSX on Github actions hosts swapping already disabled: sysctl vm.swapusage vm.swapusage: total = 0.00M used = 0.00M free = 0.00M (encrypted) Also manual switching off swap currently not possible due to do System Integrity Protection (SIP) must be disabled [7], but we don't have such access on Github Actions hosts. For local hosts it must be done manually with [8]: sudo nvram boot-args="vm_compressor=2" Added swap status control to be sure that host correctly configured: sysctl vm.swapusage Closes tarantool/tarantool-qa#99 [1]: https://github.com/tarantool/tarantool-qa/issues/93 [2]: https://github.com/torvalds/linux/blob/1e43c377a79f9189fea8f2711b399d4e8b4e609b/Documentation/admin-guide/sysctl/vm.rst#swappiness [3]: https://unix.stackexchange.com/questions/2658/why-use-swap-when-there-is-more-than-enough-free-space-in-ram [4]: https://kubuntu.ru/node/13082 [5]: https://www.kernel.org/doc/Documentation/sysctl/vm.txt [6]: http://devhead.ru/read/uskorenie-raboty-linux [7]: https://osxdaily.com/2010/10/08/mac-virtual-memory-swap/ [8]: https://gist.github.com/dan-palmer/3082266#gistcomment-3667471 --- .github/actions/environment/action.yml | 31 +++++++++++++++++++ .github/workflows/debug_coverage.yml | 4 ++- .github/workflows/release.yml | 4 ++- .github/workflows/release_asan_clang11.yml | 4 ++- .github/workflows/release_clang.yml | 4 ++- .github/workflows/release_lto.yml | 4 ++- .github/workflows/release_lto_clang11.yml | 4 ++- .github/workflows/static_build.yml | 4 ++- .../workflows/static_build_cmake_linux.yml | 4 ++- .travis.mk | 7 +++++ 10 files changed, 62 insertions(+), 8 deletions(-) diff --git a/.github/actions/environment/action.yml b/.github/actions/environment/action.yml index 59d9ac9b89..4c978489a2 100644 --- a/.github/actions/environment/action.yml +++ b/.github/actions/environment/action.yml @@ -8,4 +8,35 @@ runs: echo TEST_TIMEOUT=310 | tee -a $GITHUB_ENV echo NO_OUTPUT_TIMEOUT=320 | tee -a $GITHUB_ENV echo PRESERVE_ENVVARS=REPLICATION_SYNC_TIMEOUT,TEST_TIMEOUT,NO_OUTPUT_TIMEOUT | tee -a $GITHUB_ENV + # This switching off swap command will not work as github step + # run from inside github 'container' tag. Also it will fail to + # run from it. So running it only outside of docker container. + # Also on local hosts, like that we use for 'freebsd' workflow + # testing, 'sudo' not acceptable outside of 'freebsd' virtual + # machine and to avoid of hangs let's check sudo run ability + # with command like 'timeout 2 sudo ls /sbin/swapoff'. + # NOTE: To switch off swap from inside github 'container' tag + # additional memory flags should be added to its 'options' tag: + # options: '--memory=<some value, like 7G> --memory-swap=<the same value as for memory option>' + if which free ; then + echo "Check initial swap memory values:" + free + fi + if timeout 2 sudo ls /sbin/swapoff ; then + echo "Verified that 'sudo' enabled, switching off swap ..." + sudo /sbin/swapoff -a || echo "'swapoff' command failed, but failure is acceptable from inside the container" + if [ -e /proc/meminfo ] ; then + if [ "$(grep SwapTotal: /proc/meminfo | awk '{ print $2; }')" = "0" ] ; then + echo "Swap disabled" + else + echo "WARNING: swap still exists on the host, limits in container options can resolve it" + fi + else + echo "File '/proc/meminfo' not exists - couldn't check the swap size" + fi + if which free ; then + echo "Check updated swap memory values if 'swapoff' succeded:" + free + fi + fi shell: bash diff --git a/.github/workflows/debug_coverage.yml b/.github/workflows/debug_coverage.yml index 8584633f44..b42afe58d9 100644 --- a/.github/workflows/debug_coverage.yml +++ b/.github/workflows/debug_coverage.yml @@ -29,7 +29,9 @@ jobs: # Our testing expects that the init process (PID 1) will # reap orphan processes. At least the following test leans # on it: app-tap/gh-4983-tnt-e-assert-false-hangs.test.lua. - options: '--init' + # Memory size hard coding will be removed within resolving issue + # http://github.com/tarantool/tarantool-qa/issues/101 + options: '--init --memory=7G --memory-swap=7G' steps: - uses: actions/checkout@v1 diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml index db168521dd..c20ccf6afe 100644 --- a/.github/workflows/release.yml +++ b/.github/workflows/release.yml @@ -29,7 +29,9 @@ jobs: # Our testing expects that the init process (PID 1) will # reap orphan processes. At least the following test leans # on it: app-tap/gh-4983-tnt-e-assert-false-hangs.test.lua. - options: '--init' + # Memory size hard coding will be removed within resolving issue + # http://github.com/tarantool/tarantool-qa/issues/101 + options: '--init --memory=7G --memory-swap=7G' steps: - uses: actions/checkout@v1 diff --git a/.github/workflows/release_asan_clang11.yml b/.github/workflows/release_asan_clang11.yml index 59ec3b7642..e540fcdf9f 100644 --- a/.github/workflows/release_asan_clang11.yml +++ b/.github/workflows/release_asan_clang11.yml @@ -29,7 +29,9 @@ jobs: # Our testing expects that the init process (PID 1) will # reap orphan processes. At least the following test leans # on it: app-tap/gh-4983-tnt-e-assert-false-hangs.test.lua. - options: '--init' + # Memory size hard coding will be removed within resolving issue + # http://github.com/tarantool/tarantool-qa/issues/101 + options: '--init --memory=7G --memory-swap=7G' steps: - uses: actions/checkout@v2.3.4 diff --git a/.github/workflows/release_clang.yml b/.github/workflows/release_clang.yml index 53ef9d7c16..6d948321e8 100644 --- a/.github/workflows/release_clang.yml +++ b/.github/workflows/release_clang.yml @@ -29,7 +29,9 @@ jobs: # Our testing expects that the init process (PID 1) will # reap orphan processes. At least the following test leans # on it: app-tap/gh-4983-tnt-e-assert-false-hangs.test.lua. - options: '--init' + # Memory size hard coding will be removed within resolving issue + # http://github.com/tarantool/tarantool-qa/issues/101 + options: '--init --memory=7G --memory-swap=7G' steps: - uses: actions/checkout@v1 diff --git a/.github/workflows/release_lto.yml b/.github/workflows/release_lto.yml index a6841f4501..7f7c36687e 100644 --- a/.github/workflows/release_lto.yml +++ b/.github/workflows/release_lto.yml @@ -29,7 +29,9 @@ jobs: # Our testing expects that the init process (PID 1) will # reap orphan processes. At least the following test leans # on it: app-tap/gh-4983-tnt-e-assert-false-hangs.test.lua. - options: '--init' + # Memory size hard coding will be removed within resolving issue + # http://github.com/tarantool/tarantool-qa/issues/101 + options: '--init --memory=7G --memory-swap=7G' steps: - uses: actions/checkout@v2.3.4 diff --git a/.github/workflows/release_lto_clang11.yml b/.github/workflows/release_lto_clang11.yml index 8b06fe82f6..48cf19c607 100644 --- a/.github/workflows/release_lto_clang11.yml +++ b/.github/workflows/release_lto_clang11.yml @@ -29,7 +29,9 @@ jobs: # Our testing expects that the init process (PID 1) will # reap orphan processes. At least the following test leans # on it: app-tap/gh-4983-tnt-e-assert-false-hangs.test.lua. - options: '--init' + # Memory size hard coding will be removed within resolving issue + # http://github.com/tarantool/tarantool-qa/issues/101 + options: '--init --memory=7G --memory-swap=7G' steps: - uses: actions/checkout@v2.3.4 diff --git a/.github/workflows/static_build.yml b/.github/workflows/static_build.yml index e096e5e406..a7ab4fc86e 100644 --- a/.github/workflows/static_build.yml +++ b/.github/workflows/static_build.yml @@ -29,7 +29,9 @@ jobs: # Our testing expects that the init process (PID 1) will # reap orphan processes. At least the following test leans # on it: app-tap/gh-4983-tnt-e-assert-false-hangs.test.lua. - options: '--init' + # Memory size hard coding will be removed within resolving issue + # http://github.com/tarantool/tarantool-qa/issues/101 + options: '--init --memory=7G --memory-swap=7G' steps: - uses: actions/checkout@v1 diff --git a/.github/workflows/static_build_cmake_linux.yml b/.github/workflows/static_build_cmake_linux.yml index 891908f24d..d7becd172b 100644 --- a/.github/workflows/static_build_cmake_linux.yml +++ b/.github/workflows/static_build_cmake_linux.yml @@ -29,7 +29,9 @@ jobs: # Our testing expects that the init process (PID 1) will # reap orphan processes. At least the following test leans # on it: app-tap/gh-4983-tnt-e-assert-false-hangs.test.lua. - options: '--init' + # Memory size hard coding will be removed within resolving issue + # http://github.com/tarantool/tarantool-qa/issues/101 + options: '--init --memory=7G --memory-swap=7G' steps: - uses: actions/checkout@v1 diff --git a/.travis.mk b/.travis.mk index 3017cb09ec..d5de142073 100644 --- a/.travis.mk +++ b/.travis.mk @@ -325,6 +325,9 @@ deps_osx_github_actions: pip3 install --force-reinstall -r test-run/requirements.txt build_osx: + # due swap disabling should be manualy configured need to + # control it's status + sysctl vm.swapusage cmake . -DCMAKE_BUILD_TYPE=RelWithDebInfo -DENABLE_WERROR=ON ${CMAKE_EXTRA_PARAMS} make -j @@ -375,6 +378,9 @@ base_deps_osx_github_actions: # builddir used in this target - is a default build path from cmake # ExternalProject_Add() test_static_build_cmake_osx_no_deps: + # due swap disabling should be manualy configured need to + # control it's status + sysctl vm.swapusage cd static-build && cmake -DCMAKE_TARANTOOL_ARGS="-DCMAKE_BUILD_TYPE=RelWithDebInfo;-DENABLE_WERROR=ON" . && \ make -j && ctest -V # FIXME: Hell with SIP on OSX: Tarantool (and also LuaJIT) @@ -406,6 +412,7 @@ deps_freebsd: python27 py27-yaml py27-six py27-gevent build_freebsd: + if [ "$$(swapctl -l | wc -l)" != "1" ]; then sudo swapoff -a ; fi ; swapctl -l cmake . -DCMAKE_BUILD_TYPE=RelWithDebInfo -DENABLE_WERROR=ON ${CMAKE_EXTRA_PARAMS} gmake -j -- GitLab