Long time to mark stage as complete

Everytime I finish a stage I have to wait about 15 minutes to be able to start the following. I click on “mark stage as complete”:

For instace I just finished “Respond with body #CN2” in HTTP Server and I’ve been waiting for more than 10 minutes to be able to start the next one “Read header #FS3”.

And if I try to run the CLI codecrafters test I got 503.

Edit: took about 30 minutes for the stage to be marked as completed.

@i27ae15 hmm, it definitely shouldn’t take that long. I assume you’ve tried this already, but if you reload the page and click the button again does it still get stuck in the loading state?

I just checked and it doesn’t look like we have a handler for an error here, so if there’s an intermittent network failure it’ll “look” like the thing is loading, but in reality it’s just a failed request. We’ll add this. In the meantime, would be good to know whether reloading the page and clicking the button again works or not.

@rohitpaulk yep, tried multiple times and still got the same issue, it just hangs there as well as the CLI, also closed and open codecrafters again but doing the same.

@rohitpaulk if this helps, I closed codecrafters and tried to run tests, but still was getting this issue, after about 30 minutes I tried again and the runner started, went back to codecrafters, clicked on “complete stage” and was marked instantly.

@rohitpaulk

Following on this, still having the same issue.

I just completed another stage and it has been haging on “Mark stage as complete” for about 10 minutes.

This returned a 503 after 6 minutes:

The stage was marked as completed after almost 15 minutes.

@i27ae15 Okay can confirm that something’s up here! Just filtered for outlier transactions and I see that the endpoint takes minutes for your username specifically:

The p95 for this endpoint is 500ms, so this is definitely not expected. We’re going to dig in and figure out what the root cause is – will keep you posted!

@i27ae15 we’ve pushed a fix for this, could you try this out the next time you complete a stage, lmk if the issue still persists?

Context on what was wrong:

We’ve got some logic to compute badge awards, and this usually only triggers in early stages. It’s got a query that just fetches all submissions until then. This is usually fast since there are only 10s of “submissions” to deal with.

In your case, we ended up in a situation where the logic was being triggered even in later stages (when there are ~1000s of submissions). The innocent query was now fetching 1000s of records, and related records, and running computations on them – all within that “mark stage as complete” request.

We fixed this by (a) Optimizing the logic to only pull the last 100 submissions (b) Changing the endpoint to commit the “completion” early, and then compute awards. So now, even if the award computation does happen to take longer due to a similar bug in the future, if you refresh your page you should see the stage marked as completed.


Thanks so much for taking the time to report this!

3 Likes

@rohitpaulk Thank you! Will check this on Monday and let you know!

@rohitpaulk sorry for the delay, I was waiting for my employer to pay my subscription.

Just completed another stage and it was “mark as complete” instantly, thank you!

1 Like

Awesome, glad to hear it worked!

@rohitpaulk Found another issue!

I just completed another stage, and yes, the stage is marked as completed almost instantly, but I have to wait about 10-15 minutes to be able to start running tests for the next stage, also, same happens when I try to enable/disable an extension, it just hangs.

This is for disabling an extension:


And then for running tests I just get a 503.

Yikes, thanks for sharing - looking into this!

@i27ae15 we’re still not completely sure what’s up here, but the symptoms you’re mentioning points towards our locking strategy - we use locks on a “repository” for a bunch of actions, one of them being activating/deactivating extensions. The action itself is cheap & quick, but if something else was holding the lock for a while it could result in huge delays.

We’ve changed the locks for activating deactivating extensions to be more granular, so that shouldn’t be slow now. Still looking into the root cause though, I don’t see any Sentry traces yet (could be sampling) so can’t quite tell what’s up. We’ll work on similarly patching the test runner locks too so that it isn’t slow.

@rohitpaulk Thank you! I will keep an eye on frontend to see if I can catch any errors there.

@rohitpaulk Updating this:

This bug seems to be affecting many things, I just completed the HTTP Server and this is what I see in my profile:

And this is what I see in my catalog of challenges:

This one is very funny, do you get this data from different sources?
Edit: It took about 10 - 15 to get synchronized both in my profile and the challenges catalog section. So, same issue as you mentioned I guess.


Still between challenges I have to wait 10 - 15 minutes to start testing the next one, this is what I get in between:

Also, I am wondering, you mentioned that the hanging issue was due to a logic you got to give badges, I haven’t been granted any badge in a while, and in that section, I see this:

It says blocked but I can see the text and I can’t see the text for the others that are blocked (might be expected of course).

I don’t know if it’s because I haven’t been able to achieve any new one or this is relevant to the issue.