src/main.cc · 667930de6cd7e6db61a0c787723cfab9e4cba4ef · core / tarantool

5 years ago

popen: introduce a backend engine · f58cb606

Cyrill Gorcunov authored 5 years ago


In the patch we introduce popen backend engine which provides
a way to execute external programs and communicate with their
stdin/stdout/stderr streams.

It is possible to run a child process with:

 a) completely closed stdX descriptors
 b) provide /dev/null descriptors to appropritate stdX
 c) pass new transport into a child (currently we use
    pipes for this sake, but may extend to tty/sockets)
 d) inherit stdX from a parent, iow do nothing

On tarantool start we create @popen_pids_map hash which maps
created processes PIDs to popen_handle structure, this structure
keeps everything needed to control and communicate with the children.
The hash will allow us to find a hild process quickly from inside
of a signal handler.

Each handle links into @popen_head list, which is need to be able
to destory children processes on exit procedure (ie when we exit
tarantool and need to cleanup the resources used).

Every new process is born by vfork() call - we can't use fork()
because of at_fork() handlers in libeio which cause deadlocking
in internal mutex usage. Thus the caller waits until vfork()
finishes its work and runs exec (or exit with error).

Because children processes are running without any limitations
they can exit by self or can be killed by some other third side
(say user of a hw node), we need to watch their state which is
done by setting a hook with ev_child_start() helper. This helper
allows us to catch SIGCHLD when a child get exited/signaled
and unregister it from a pool or currently running children.
Note the libev wait() reaps child zomby by self. Another
interesting detail is that libev catches signal in async way
but our SIGCHLD hook is called in sync way before child reap.

This engine provides the following API:
 - popen_init
	to initialize engine
 - popen_free
	to finalize engine and free all reasources
	allocated so far
 - popen_new
	to create a new child process and start it
 - popen_delete
	to release resources occupied and
	terminate a child process
 - popen_stat
	to fetch statistics about a child process
 - popen_command
	to fetch command line string formerly used
	on the popen object creation
 - popen_write_timeout
	to write data into child's stdin with
	timeout
 - popen_read_timeout
	to read data from child's stdout/stderr
	with timeout
 - popen_state
	to fetch state (alive, exited or killed) and
	exit code of a child process
 - popen_state_str
	to get state of a child process in string
	form, for Lua usage mostly
 - popen_send_signal
	to send signal to a child process (for
	example to kill it)

Known issues to fix in next series:

 - environment variables for non-linux systems do not support
   inheritance for now due to lack of testing on my side;

 - for linux base systems we use popen2 system call passing
   O_CLOEXEC flag so that two concurrent popen_create calls
   would not affect each other with pipes inheritance (while
   currently we don't have a case where concurrent calls could
   be done as far as I know, still better to be on a safe side
   from the beginning);

 - there are some files (such as xlog) which tarantool opens
   for own needs without setting O_CLOEXEC flag and it get
   propagated to a children process; for linux based systems
   we use close_inherited_fds helper which walks over opened
   files of a process and close them. But for other targets
   like MachO or FreeBSD this helper just zapped simply because
   I don't have such machines to experimant with; we should
   investigate this moment in more details later once base
   code is merged in;

 - need to consider a case where we will be using piping for
   descriptors (for example we might be writting into stdin
   of a child from another pipe, for this sake we could use
   splice() syscall which gonna be a way faster than copying
   data inside kernel between process). Still the question
   is -- do we really need it? Since we use interanal flags
   in popen handle this should not be a big problem to extend
   this interfaces;

   this particular feature is considered to have a very low
   priority but I left it here just to not forget.

Part-of #4031

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

f58cb606

History

popen: introduce a backend engine

Cyrill Gorcunov authored 5 years ago


In the patch we introduce popen backend engine which provides
a way to execute external programs and communicate with their
stdin/stdout/stderr streams.

It is possible to run a child process with:

 a) completely closed stdX descriptors
 b) provide /dev/null descriptors to appropritate stdX
 c) pass new transport into a child (currently we use
    pipes for this sake, but may extend to tty/sockets)
 d) inherit stdX from a parent, iow do nothing

On tarantool start we create @popen_pids_map hash which maps
created processes PIDs to popen_handle structure, this structure
keeps everything needed to control and communicate with the children.
The hash will allow us to find a hild process quickly from inside
of a signal handler.

Each handle links into @popen_head list, which is need to be able
to destory children processes on exit procedure (ie when we exit
tarantool and need to cleanup the resources used).

Every new process is born by vfork() call - we can't use fork()
because of at_fork() handlers in libeio which cause deadlocking
in internal mutex usage. Thus the caller waits until vfork()
finishes its work and runs exec (or exit with error).

Because children processes are running without any limitations
they can exit by self or can be killed by some other third side
(say user of a hw node), we need to watch their state which is
done by setting a hook with ev_child_start() helper. This helper
allows us to catch SIGCHLD when a child get exited/signaled
and unregister it from a pool or currently running children.
Note the libev wait() reaps child zomby by self. Another
interesting detail is that libev catches signal in async way
but our SIGCHLD hook is called in sync way before child reap.

This engine provides the following API:
 - popen_init
	to initialize engine
 - popen_free
	to finalize engine and free all reasources
	allocated so far
 - popen_new
	to create a new child process and start it
 - popen_delete
	to release resources occupied and
	terminate a child process
 - popen_stat
	to fetch statistics about a child process
 - popen_command
	to fetch command line string formerly used
	on the popen object creation
 - popen_write_timeout
	to write data into child's stdin with
	timeout
 - popen_read_timeout
	to read data from child's stdout/stderr
	with timeout
 - popen_state
	to fetch state (alive, exited or killed) and
	exit code of a child process
 - popen_state_str
	to get state of a child process in string
	form, for Lua usage mostly
 - popen_send_signal
	to send signal to a child process (for
	example to kill it)

Known issues to fix in next series:

 - environment variables for non-linux systems do not support
   inheritance for now due to lack of testing on my side;

 - for linux base systems we use popen2 system call passing
   O_CLOEXEC flag so that two concurrent popen_create calls
   would not affect each other with pipes inheritance (while
   currently we don't have a case where concurrent calls could
   be done as far as I know, still better to be on a safe side
   from the beginning);

 - there are some files (such as xlog) which tarantool opens
   for own needs without setting O_CLOEXEC flag and it get
   propagated to a children process; for linux based systems
   we use close_inherited_fds helper which walks over opened
   files of a process and close them. But for other targets
   like MachO or FreeBSD this helper just zapped simply because
   I don't have such machines to experimant with; we should
   investigate this moment in more details later once base
   code is merged in;

 - need to consider a case where we will be using piping for
   descriptors (for example we might be writting into stdin
   of a child from another pipe, for this sake we could use
   splice() syscall which gonna be a way faster than copying
   data inside kernel between process). Still the question
   is -- do we really need it? Since we use interanal flags
   in popen handle this should not be a big problem to extend
   this interfaces;

   this particular feature is considered to have a very low
   priority but I left it here just to not forget.

Part-of #4031

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>