question

amazdeh avatar image
amazdeh asked

StartServer failing with capacity related message

Hi
We have a custom match maker calling our https://B7F9.playfabapi.com/Matchmaker/StartGame
Usually we have no issues but yesterday our custom match maker which uses application insight informed us of many faiures.
We send same game mode, version and USEast region as all other requests before and after (Don't have exact JSON requests)
The error .NET SDK gives us is region at capacity (invalid parameters)

This went away on its own , we had near 40% failure at 8th

1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

amazdeh avatar image amazdeh commented ·

Ah and btw in our match maker we send number of currently running servers to application insights and maximum number of instances we had was a bit higher than 20 and we run 20 instances per machine and could support more than 150 games easily. Also our min free slots was 3 and we start servers before players request them. so for example we always have 3 servers running waiting to be assigned and because of these all, we could not have the issue really due to capacity for sure and server not being available for 40% of the requests in the day. We don't do any crazy retires either. We can share more using email or some other mechanism, our match maker code, application insights data ...

0 Likes 0 ·

1 Answer

·
brendan avatar image
brendan answered

The "Region at Capacity" error means that it's not currently possible for us to start a new instance of the game server. Specifically, the service attempts to start an instance a couple of times before returning this error. What that means is that there are no free instance slots in any running game server hosts at the moment. That does not necessarily mean you've hit the limit on the number of game server hosts you can run, however. The purpose of the min free slots setting is for you to control how many free instance slots need to be available, so that you have enough to provide for the maximum number of instances that might need to spin up over several minutes (10, to be safe). When you spin up new instances, such that you're below the min free slots setting, we request a new EC2 server for your title. Getting one from EC2 takes non-zero time. Once we have it, we have to image it with the AMI - again, this takes time. Finally, we have to copy your game server zip to it from S3 and decompress it to the right folder.

Right now, you have your min free slots set to 8. If you use up those 8 very quickly and still need more, you're going to hit the "Region at Capacity" error until a new server host is ready. If you're running into that issue, it sounds like you need to set your min free slots higher.

7 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

pfnathan avatar image pfnathan ♦ commented ·

Thanks for the information, we will have our engineering team to investigate and post on update.

0 Likes 0 ·
amazdeh avatar image amazdeh commented ·

Thanks for describing but I'm aware of all those details. Does it explain a 40% failure in our game on a single day? I don't think so. We don't have particularly huge traffic and you can understand that by looking at our DAU , And as I described , due to slow startup of the first instance on a machine we cache 3 servers for new sessions before they get requested.

Is it possible that Amazon could not start the server for you or your startup scripts failed for one reason or another? We had a similar issues 2 weeks before this incident but back then we did not have number of instances sent to application insights, This time we had that and we know we did never had more than 20-25 instances that day.
We will sign the contract and i guess a better look with the support team will help, maybe it was just free instances being too low (it was 3 at the time of the incident in fact), but i am not sure about it and doubt it a bit as well. That 3 min free slots has been set by another engineer instead of 8 as a mistake and 8 surely covers us well at least for now.

0 Likes 0 ·
brendan avatar image brendan amazdeh commented ·

Actually, yes. If your min free slots was not set to account for the max number of session that might be needed, I could easily see it accounting for a large number of failures.

My question though, would be what was the error you were getting? Looking at your dashboard, I'm not seeing errors showing up for your StartGame calls. I do see two spikes in calls to that endpoint on the 7th though, that could be related to what you saw.

0 Likes 0 ·
amazdeh avatar image amazdeh commented ·

The error was 400 InvalidParameters and in the report said, "region at capacity". Note that the parameters sent to the StartServer are constant values so they could not be the issue. Depending on the time zone it could be 7th or 8th, If your times are in UTC , it should be 8th , if PST then yes some of that is 7th i guess. One weird thing is that the response code and error (400 and invalidParameters) , don't match the generated ErrorReport of Region at capacity. I've seen this behavior in StartServer before when testing with the .NET SDK before making the MatchMaker, Sometimes the response and the actual error did not match the error report/description. Even i remember StartServer returning failure but actually starting the instance a few months ago.

0 Likes 0 ·
brendan avatar image brendan amazdeh commented ·

That's really odd - "region at capacity" and "invalid parameters" are two very different errors, and are caused by very different things. If you've got a case that reproduces this with any kind of consistency, can you share that with us? It's sounding like you've found a corner case, given that we've never heard of this before, so I'd really like to get as much info as we can on how to reproduce this.

0 Likes 0 ·
amazdeh avatar image amazdeh commented ·

I can no reproduce, it only happened during that time. The first time i saw it and i could reproduce it, i sent you an email i believe but it was months ago and i forgot. I think if one of your developers creates a simple console app which tries to start game servers when you run it, you'll reproduce it after some test runs. I got no time to do this unfortunately. The only thing which might matter is, our game versions were usually of the for x.yyy and game mode usually contains _. I know this is hardly related but is the only info i got.

0 Likes 0 ·
brendan avatar image brendan amazdeh commented ·

Understood. We'll try a few tests here, but we don't have any other reports of this behavior at the moment. If you do find a reproducible test case, please feel free to send that our way.

0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.