From fd6ee6d5b8bd5cbcb81f4a650c37ea5d5fac2a16 Mon Sep 17 00:00:00 2001
From: "Alexander V. Tikhonov" <avtikhon@tarantool.org>
Date: Tue, 23 Mar 2021 09:37:55 +0300
Subject: [PATCH] github-ci: switch off swap use

Github Actions provides hosts for Linux base runners in the following
configurations:

  2 Cores
  7 Gb memory
  4 Gb swap memory

To avoid of issues with hanging/slowing tests on high memory use
like [1], hosts configurations must avoid of swap memory use. All
of the tests workflows run inside dockers containers. This patch
sets in docker run configurations memory limits based on current
github actions hosts - 7Gb memory w/o swap memory increase.

Checked 10 full runs (29 workflows in each run used the change) and
got single failed test on gevent() routine in test-run. This result much
better than w/o this patch when 3-4 of workflows fail on each full run.

It could happen because swappiness set to default value:

  cat /sys/fs/cgroup/memory/memory.swappiness
  60

From documentation on swappiness [2]:

  This control is used to define the rough relative IO cost of swapping
  and filesystem paging, as a value between 0 and 200. At 100, the VM
  assumes equal IO cost and will thus apply memory pressure to the page
  cache and swap-backed pages equally; lower values signify more
  expensive swap IO, higher values indicates cheaper.
  Keep in mind that filesystem IO patterns under memory pressure tend to
  be more efficient than swap's random IO. An optimal value will require
  experimentation and will also be workload-dependent.

We may try to tune how often anonymous pages are swapped using the
swappiness parameter, but our goal is to stabilize timings (and make
them as predictable as possible), so the best option is to disable swap
at all and work on descreasing memory consumption for huge tests.

For Github Actions host configurations with 7Gb RAM it means that after
2.8Gb RAM was used swap began to use. But in testing we have some tests
that use 2.5Gb of RAM like 'box/net_msg_max.test.lua' and memory
fragmentation could cause after the test run swap use [3].

Also found that disk cache could use some RAM and it also was the cause
of fast memory use and start swapping. It can be periodically dropped
from memory [4] using 'drop_cache' system value setup, but it won't fix
the overall issue with swap use.

After freed cached pages in RAM another system kernel option can be
tuned [5][6] 'vfs_cache_pressure'. This percentage value controls the
tendency of the kernel to reclaim the memory which is used for caching
of directory and inode objects. Increasing it significantly beyond
default value of 100 may have negative performance impact. Reclaim code
needs to take various locks to find freeable directory and inode
objects. With 'vfs_cache_pressure=1000', it will look for ten times more
freeable objects than there are. This patch won't do this change, but
it can be done as the next change.

To fix the issue there were made changes:

 - For jobs that run tests and use actions/environment and don't use
   Github Actions container tag, it was set 'sudo swapoff -a' command
   in actions/environment action.

 - For jobs that run tests and use Github Actions container tag the
   previous solution doesn't work. It was decided to hard-code the
   memory value based on found on Github Actions hosts memory size
   7Gb. It was set for Github container tag as additional options:
     options: '--init --memory=7G --memory-swap=7G'
   This changes were made temporary till these containers tags will
   be removed within resolving tarantool/tarantool-qa#101 issue for
   workflows:
     debug_coverage
     release
     release_asan_clang11
     release_clang
     release_lto
     release_lto_clang11
     static_build
     static_build_cmake_linux

 - For VMware VMs like with FreeBSD added 'sudo swapoff -a' command
   before build commands.

 - For OSX on Github actions hosts swapping already disabled:
     sysctl vm.swapusage
     vm.swapusage: total = 0.00M  used = 0.00M  free = 0.00M  (encrypted)
   Also manual switching off swap currently not possible due to do
   System Integrity Protection (SIP) must be disabled [7], but we
   don't have such access on Github Actions hosts. For local hosts
   it must be done manually with [8]:
     sudo nvram boot-args="vm_compressor=2"
   Added swap status control to be sure that host correctly configured:
     sysctl vm.swapusage

Closes tarantool/tarantool-qa#99

[1]: https://github.com/tarantool/tarantool-qa/issues/93
[2]: https://github.com/torvalds/linux/blob/1e43c377a79f9189fea8f2711b399d4e8b4e609b/Documentation/admin-guide/sysctl/vm.rst#swappiness
[3]: https://unix.stackexchange.com/questions/2658/why-use-swap-when-there-is-more-than-enough-free-space-in-ram
[4]: https://kubuntu.ru/node/13082
[5]: https://www.kernel.org/doc/Documentation/sysctl/vm.txt
[6]: http://devhead.ru/read/uskorenie-raboty-linux
[7]: https://osxdaily.com/2010/10/08/mac-virtual-memory-swap/
[8]: https://gist.github.com/dan-palmer/3082266#gistcomment-3667471
---
 .github/actions/environment/action.yml        | 31 +++++++++++++++++++
 .github/workflows/debug_coverage.yml          |  4 ++-
 .github/workflows/release.yml                 |  4 ++-
 .github/workflows/release_asan_clang11.yml    |  4 ++-
 .github/workflows/release_clang.yml           |  4 ++-
 .github/workflows/release_lto.yml             |  4 ++-
 .github/workflows/release_lto_clang11.yml     |  4 ++-
 .github/workflows/static_build.yml            |  4 ++-
 .../workflows/static_build_cmake_linux.yml    |  4 ++-
 .travis.mk                                    |  7 +++++
 10 files changed, 62 insertions(+), 8 deletions(-)

diff --git a/.github/actions/environment/action.yml b/.github/actions/environment/action.yml
index 59d9ac9b89..4c978489a2 100644
--- a/.github/actions/environment/action.yml
+++ b/.github/actions/environment/action.yml
@@ -8,4 +8,35 @@ runs:
         echo TEST_TIMEOUT=310 | tee -a $GITHUB_ENV
         echo NO_OUTPUT_TIMEOUT=320 | tee -a $GITHUB_ENV
         echo PRESERVE_ENVVARS=REPLICATION_SYNC_TIMEOUT,TEST_TIMEOUT,NO_OUTPUT_TIMEOUT | tee -a $GITHUB_ENV
+        # This switching off swap command will not work as github step
+        # run from inside github 'container' tag. Also it will fail to
+        # run from it. So running it only outside of docker container.
+        # Also on local hosts, like that we use for 'freebsd' workflow
+        # testing, 'sudo' not acceptable outside of 'freebsd' virtual
+        # machine and to avoid of hangs let's check sudo run ability
+        # with command like 'timeout 2 sudo ls /sbin/swapoff'.
+        # NOTE: To switch off swap from inside github 'container' tag
+        # additional memory flags should be added to its 'options' tag:
+        #   options: '--memory=<some value, like 7G> --memory-swap=<the same value as for memory option>'
+        if which free ; then
+            echo "Check initial swap memory values:"
+            free
+        fi
+        if timeout 2 sudo ls /sbin/swapoff ; then
+            echo "Verified that 'sudo' enabled, switching off swap ..."
+            sudo /sbin/swapoff -a || echo "'swapoff' command failed, but failure is acceptable from inside the container"
+            if [ -e /proc/meminfo ] ; then
+                if [ "$(grep SwapTotal: /proc/meminfo | awk '{ print $2; }')" = "0" ] ; then
+                    echo "Swap disabled"
+                else
+                    echo "WARNING: swap still exists on the host, limits in container options can resolve it"
+                fi
+            else
+                echo "File '/proc/meminfo' not exists - couldn't check the swap size"
+            fi
+            if which free ; then
+                echo "Check updated swap memory values if 'swapoff' succeded:"
+                free
+            fi
+        fi
       shell: bash
diff --git a/.github/workflows/debug_coverage.yml b/.github/workflows/debug_coverage.yml
index 8584633f44..b42afe58d9 100644
--- a/.github/workflows/debug_coverage.yml
+++ b/.github/workflows/debug_coverage.yml
@@ -29,7 +29,9 @@ jobs:
       # Our testing expects that the init process (PID 1) will
       # reap orphan processes. At least the following test leans
       # on it: app-tap/gh-4983-tnt-e-assert-false-hangs.test.lua.
-      options: '--init'
+      # Memory size hard coding will be removed within resolving issue
+      #   http://github.com/tarantool/tarantool-qa/issues/101
+      options: '--init --memory=7G --memory-swap=7G'
 
     steps:
       - uses: actions/checkout@v1
diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml
index db168521dd..c20ccf6afe 100644
--- a/.github/workflows/release.yml
+++ b/.github/workflows/release.yml
@@ -29,7 +29,9 @@ jobs:
       # Our testing expects that the init process (PID 1) will
       # reap orphan processes. At least the following test leans
       # on it: app-tap/gh-4983-tnt-e-assert-false-hangs.test.lua.
-      options: '--init'
+      # Memory size hard coding will be removed within resolving issue
+      #   http://github.com/tarantool/tarantool-qa/issues/101
+      options: '--init --memory=7G --memory-swap=7G'
 
     steps:
       - uses: actions/checkout@v1
diff --git a/.github/workflows/release_asan_clang11.yml b/.github/workflows/release_asan_clang11.yml
index 59ec3b7642..e540fcdf9f 100644
--- a/.github/workflows/release_asan_clang11.yml
+++ b/.github/workflows/release_asan_clang11.yml
@@ -29,7 +29,9 @@ jobs:
       # Our testing expects that the init process (PID 1) will
       # reap orphan processes. At least the following test leans
       # on it: app-tap/gh-4983-tnt-e-assert-false-hangs.test.lua.
-      options: '--init'
+      # Memory size hard coding will be removed within resolving issue
+      #   http://github.com/tarantool/tarantool-qa/issues/101
+      options: '--init --memory=7G --memory-swap=7G'
 
     steps:
       - uses: actions/checkout@v2.3.4
diff --git a/.github/workflows/release_clang.yml b/.github/workflows/release_clang.yml
index 53ef9d7c16..6d948321e8 100644
--- a/.github/workflows/release_clang.yml
+++ b/.github/workflows/release_clang.yml
@@ -29,7 +29,9 @@ jobs:
       # Our testing expects that the init process (PID 1) will
       # reap orphan processes. At least the following test leans
       # on it: app-tap/gh-4983-tnt-e-assert-false-hangs.test.lua.
-      options: '--init'
+      # Memory size hard coding will be removed within resolving issue
+      #   http://github.com/tarantool/tarantool-qa/issues/101
+      options: '--init --memory=7G --memory-swap=7G'
 
     steps:
       - uses: actions/checkout@v1
diff --git a/.github/workflows/release_lto.yml b/.github/workflows/release_lto.yml
index a6841f4501..7f7c36687e 100644
--- a/.github/workflows/release_lto.yml
+++ b/.github/workflows/release_lto.yml
@@ -29,7 +29,9 @@ jobs:
       # Our testing expects that the init process (PID 1) will
       # reap orphan processes. At least the following test leans
       # on it: app-tap/gh-4983-tnt-e-assert-false-hangs.test.lua.
-      options: '--init'
+      # Memory size hard coding will be removed within resolving issue
+      #   http://github.com/tarantool/tarantool-qa/issues/101
+      options: '--init --memory=7G --memory-swap=7G'
 
     steps:
       - uses: actions/checkout@v2.3.4
diff --git a/.github/workflows/release_lto_clang11.yml b/.github/workflows/release_lto_clang11.yml
index 8b06fe82f6..48cf19c607 100644
--- a/.github/workflows/release_lto_clang11.yml
+++ b/.github/workflows/release_lto_clang11.yml
@@ -29,7 +29,9 @@ jobs:
       # Our testing expects that the init process (PID 1) will
       # reap orphan processes. At least the following test leans
       # on it: app-tap/gh-4983-tnt-e-assert-false-hangs.test.lua.
-      options: '--init'
+      # Memory size hard coding will be removed within resolving issue
+      #   http://github.com/tarantool/tarantool-qa/issues/101
+      options: '--init --memory=7G --memory-swap=7G'
 
     steps:
       - uses: actions/checkout@v2.3.4
diff --git a/.github/workflows/static_build.yml b/.github/workflows/static_build.yml
index e096e5e406..a7ab4fc86e 100644
--- a/.github/workflows/static_build.yml
+++ b/.github/workflows/static_build.yml
@@ -29,7 +29,9 @@ jobs:
       # Our testing expects that the init process (PID 1) will
       # reap orphan processes. At least the following test leans
       # on it: app-tap/gh-4983-tnt-e-assert-false-hangs.test.lua.
-      options: '--init'
+      # Memory size hard coding will be removed within resolving issue
+      #   http://github.com/tarantool/tarantool-qa/issues/101
+      options: '--init --memory=7G --memory-swap=7G'
 
     steps:
       - uses: actions/checkout@v1
diff --git a/.github/workflows/static_build_cmake_linux.yml b/.github/workflows/static_build_cmake_linux.yml
index 891908f24d..d7becd172b 100644
--- a/.github/workflows/static_build_cmake_linux.yml
+++ b/.github/workflows/static_build_cmake_linux.yml
@@ -29,7 +29,9 @@ jobs:
       # Our testing expects that the init process (PID 1) will
       # reap orphan processes. At least the following test leans
       # on it: app-tap/gh-4983-tnt-e-assert-false-hangs.test.lua.
-      options: '--init'
+      # Memory size hard coding will be removed within resolving issue
+      #   http://github.com/tarantool/tarantool-qa/issues/101
+      options: '--init --memory=7G --memory-swap=7G'
 
     steps:
       - uses: actions/checkout@v1
diff --git a/.travis.mk b/.travis.mk
index 3017cb09ec..d5de142073 100644
--- a/.travis.mk
+++ b/.travis.mk
@@ -325,6 +325,9 @@ deps_osx_github_actions:
 	pip3 install --force-reinstall -r test-run/requirements.txt
 
 build_osx:
+	# due swap disabling should be manualy configured need to
+	# control it's status
+	sysctl vm.swapusage
 	cmake . -DCMAKE_BUILD_TYPE=RelWithDebInfo -DENABLE_WERROR=ON ${CMAKE_EXTRA_PARAMS}
 	make -j
 
@@ -375,6 +378,9 @@ base_deps_osx_github_actions:
 # builddir used in this target - is a default build path from cmake
 # ExternalProject_Add()
 test_static_build_cmake_osx_no_deps:
+	# due swap disabling should be manualy configured need to
+	# control it's status
+	sysctl vm.swapusage
 	cd static-build && cmake -DCMAKE_TARANTOOL_ARGS="-DCMAKE_BUILD_TYPE=RelWithDebInfo;-DENABLE_WERROR=ON" . && \
 	make -j && ctest -V
 	# FIXME: Hell with SIP on OSX: Tarantool (and also LuaJIT)
@@ -406,6 +412,7 @@ deps_freebsd:
 		python27 py27-yaml py27-six py27-gevent
 
 build_freebsd:
+	if [ "$$(swapctl -l | wc -l)" != "1" ]; then sudo swapoff -a ; fi ; swapctl -l
 	cmake . -DCMAKE_BUILD_TYPE=RelWithDebInfo -DENABLE_WERROR=ON ${CMAKE_EXTRA_PARAMS}
 	gmake -j
 
-- 
GitLab