I’m stuck on Stage #AT7 and I think there’s a bug in the testing code.
The process is running in background, I can look up the pid in ps.
My shell:
$ sleep 15000 &
[1] 90531
Bash:
$ ps aux | grep 90531
dom 90531 0.0 0.0 5572 2140 pts/3 S+ 21:57 0:00 sleep 15000
dom 90536 0.0 0.0 6520 2388 pts/0 S+ 21:57 0:00 grep --color=always 90531
It is correctly running in background. Yet, the test fails here:
[tester::#AT7] Running tests for Stage #AT7 (Background Jobs - Starting background jobs)
[tester::#AT7] Running ./your_program.sh
[your-program] $ sleep 500 &
[your-program] [1] 44
[tester::#AT7] Could not find process with PID 44
[tester::#AT7] Test failed
I think the 500 ms sleep may be too short and my machine too slow, so the process terminates before the test has had a chance to verify it’s running. Just a guess.
Oh that’s 500 seconds, not ms. Still can’t work around the issue though.
Anyone ever hit something similar?
You are probably going to have to show us you process launching code to see what happens.
Does the test actually take 500s? Or does it fail sooner than that?
Also, is the spawned process a direct child of your Shell, or are there any intermediary processes spawned too?
It fails sooner. It’s a direct child. Let me paste the code:
def run_executable(command_path: str, user_input: UserInput) -> None:
pid = os.fork()
if pid == 0:
if user_input.is_background:
# Create new session for the child process, detaching it from the
# parent's terminal and process group: signals sent to the shell's
# process group won't reach the background child.
os.setsid()
# Redirect stdin to /dev/null to prevent the child from blocking or
# dying on a closed pipe.
devnull = os.open("/dev/null", os.O_RDONLY)
os.dup2(devnull, 0)
os.close(devnull)
if user_input.stdout_file:
mode = (
os.O_WRONLY
| os.O_CREAT
| (os.O_APPEND if user_input.stdout_append else os.O_TRUNC)
)
stdout_fd = os.open(user_input.stdout_file, mode, 0o644)
os.dup2(stdout_fd, sys.stdout.fileno())
os.close(stdout_fd)
if user_input.stderr_file:
mode = (
os.O_WRONLY
| os.O_CREAT
| (os.O_APPEND if user_input.stderr_append else os.O_TRUNC)
)
stderr_fd = os.open(user_input.stderr_file, mode, 0o644)
os.dup2(stderr_fd, sys.stderr.fileno())
os.close(stderr_fd)
try:
os.execv(command_path, [user_input.command] + user_input.parameters)
except OSError as e:
os._exit(127)
else:
if user_input.is_background:
job_number = 1
if jobs:
last_key = max(jobs.keys())
job_number = last_key + 1
jobs[job_number] = pid
outln(f"[{job_number}] {pid}")
else:
os.waitpid(pid, 0)
even simplifying the code to:
def run_executable(command_path: str, user_input: UserInput) -> None:
pid = os.fork()
if pid == 0:
os.execv(command_path, [user_input.command] + user_input.parameters)
else:
if user_input.is_background:
job_number = 1
if jobs:
last_key = max(jobs.keys())
job_number = last_key + 1
jobs[job_number] = pid
outln(f"[{job_number}] {pid}")
else:
os.waitpid(pid, 0)
causes the same error
I assume command_path is the fully resolved path to the command, e.g. /usr/bin/sleep and not just sleep? I guess it would have to be otherwise you example earlier would fail.
Yeah, unless the child crashes really quickly (somehow finding a different sleep or something), I can’t see the issue. Try adding a 1 second sleep in Python before you output the job number and PID in the parent; I can’t see and doubt there is a race issue there, but quick experiment to try.
Can you verify, at least locally, that ppid and pid are as expected for both shell and child?
ps -o pid,ppid,cmd
Yes: command_path=‘/bin/sleep’.
Adding a 1 second sleep before outputting the job number/pid didn’t help.
From my shell:
$ sleep 15 &
[1] 50410
From bash:
ps -o pid,ppid,cmd -p 50410
PID PPID CMD
50410 50348 sleep 15
$ ps -o pid,ppid,cmd -p 50348
PID PPID CMD
50348 1484 python3 app/main.py
Even using pstree, I don’t see anything anomalous (apart from some zombie processes I need to reap):
|-tmux: server(1483)-+-bash(1484)---python3(50348)-+-sleep(50350)
| | |-sleep(50410)
| | `-sleep(50631)
I don’t know what else to try. Can I find somewhere the code that checks for the presence of the child process or is it closed source?
Hey @fly-434, this is an issue on our end.
We’re investigating it now and will share an update as soon as we have a fix
Thanks for looking into this. Let me know if I can help in any way
Quick update: we’ve pushed a fix in this PR.
Could you give it another try and let me know how it goes? @fly-434