question

dryginstudios avatar image
dryginstudios asked

CloudScript random HTTP errors since a few days

Hello,

We are PRO user and tried to use the devrel email but it dosen't work anymore. The Support Link that is supposed to be in the admin console cannot be found... I will post here...

Since a few days we started to notice a lot of random HTTP errors in the Automation/CloudScript dashboard.

We use CloudScript HTTP calls to our Azure App to handle game logic things that cannot be managed into playfab.

The problems is that playfab returns theses calls as HTTP error but Azure seems to say that all systems are OK and no 500/400 errors being reported to justify theses CloudScript Http Calls. We have also double check our app recycling settings and uptime and the App has not recycle in the last 24 hours on Azure. We need to identify what are thoses http calls error but we cannot debug since we don’t have any logs available from the CloudScript Errors. This setup was working fine for few months but it has been consistly reporting errors now for a few days.

Please see below a graph of both platforms for the last 24 hours for both PlayFab and Azure.

Thanks,

J-F

errorazure.png (66.3 KiB)
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

1 Answer

·
brendan avatar image
brendan answered

The issue is that the calls to the third-party endpoints aren't returning in time, so while the functions are succeeding, it's taking too long and a timeout error being thrown in the script. While there aren't logs available in the Game Manager for this, one thing to note in the graphs is that there's a jump in average execution times each time you have a cluster of those errors. For debugging, I would suggest is putting a try/catch around the call, so that you can log the results before exiting. Meanwhile, I'd also suggest moving to our Azure Functions Cloud Script integration, since you can have much longer-running scripts with longer HTTP call timeouts in that model.

5 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

brendan avatar image brendan commented ·

Also, to submit a ticket in the Game Manager, just click on the ? in the upper right corner and use the Contact option. That will be there for anyone with access to that permission for the title (via the Roles defined for the studio).

0 Likes 0 ·
dryginstudios avatar image dryginstudios commented ·

Hello @Brendan,

I was under the assumption that the timeout for http calls in a CloudScript function is 2000 ms. Below is a report of 24 hours MAX response time from the Azure Web App showing a max response time of around 400 in the worst case.

One reason for this worst case is also that we actually wrap failing calls to PlayFab, wait, then retry up to 3 times with a 100ms delay. This is the errors we are sometimes getting from the Server API when calling PlayFab endpoint such as GetPlayerData,UpdatePlayerData, etc...:

The remote server returned an error: (504) Gateway Timeout.

The remote server returned an error: (503) Server Unavailable.

The remote server returned an error: (502) Bad Gateway.

I would be happy to fix any slow running code on our side but there is not indications that this is the issue. Right now, it seems like there is "momentarily failures" in both directions.

PlayFab = > Azure and Azure => PlayFab.

0 Likes 0 ·
azuremax.png (41.1 KiB)
brendan avatar image brendan dryginstudios commented ·

Ah, thanks for that - you'd originally said there were no 500/400 errors, which I took to mean that all calls from the scripts were succeeding. That would only leave the timeouts, and I do see a fair number of those occurring in the service in general around those times.

In general, those specific errors can occur for a range of reasons, from routing issues to temporary rejections due to sudden changes in call rates for the title. In this case, the retries are part of the problem - retries need to be managed with an exponential backoff, so that you don't wind up with a "logjam" situation where multiple clients are all re-trying over and over at the same time. Basically, these types of errors tend to be rare, except for the case where calls spike due to fast, frequent retries. Then, they tend to pile up until the service can compensate. Also, I'd avoid having the first retry be within milliseconds, since if the issue is there's a scaling event occurring, and you have a bunch of clients all doing this at once, that'll just make the issue worse. And generally speaking, I'd return to the client on one of those errors, so that you don't have the client waiting too long.

0 Likes 0 ·
dryginstudios avatar image dryginstudios commented ·

Also, for the Support button. We only have one Admin user (myself) and I still can't locate the support button. Please see screen below.

capture-playafb.png

0 Likes 0 ·
capture-playafb.png (101.0 KiB)
brendan avatar image brendan dryginstudios commented ·

Sorry, to be more specific, you can get to it by clicking the ? from any page when you're looking at a title. The title is part of the context for the ticket, so that our support team has the Title ID.

0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.