This timeout has happened in our room simulators a few times this week. Our code is detecting that the bridge disconnected (with onDisconnected event) with a reason of “Timeout”.
Since we’re using the AutoSimulatorConnection script, it tries successfully to reconnect, but I guess our objects that the bridge was maintaining are disabled/destroyed so bad things happen from there.
What would cause a bridge timeout - and any advice on handling it when it happens? Is it possible to easily recover? Our empty room timeout is set to 30 minutes, so I don’t believe that should be triggering here…?
Thanks!
Coherence v1.3.1
Unity 6000.0.24f1 on MacOS
Project: cs3cgmv088db6off6rug
A timeout will happen fairly quickly if there are no packets sent from any clients to the room within 5 seconds or if this sim is not also sending any packets. When you experience this do you perhaps have clients in a debugger/paused state or otherwise stopping network traffic?
I see: ActivePlayers-- now 0 - does this mean all players have disconnected?
I see - ok, so that is pretty normal then. I’m just testing by myself during development the player count goes from 1 back to 0 often. And yes if I enter the Unity debugger that would also starve the connection. The simulator is quiet with no clients connected too.
Just for my own understanding:
Q1: When the bridge times out, but the simulator is still running (for another 30 minutes in our case), does it make sense to be auto-reconnecting with the AutoSimulatorConnection script? Seems like these two things will just fight each other with constant reconnects and timeouts?
Q2: When the bridge reconnects does it recreate all the objects in the RS? Or are those ok to still be in memory in the simulator?
Let me ask the SDK and engine teams if there’s a best practice for a simulator keeping a room RS alive in the situation where there are no clients sending packets.
If your entities are owned by the simulator and/or they are “persistent” and adopted by the simulator when a client disconnects then they will remain on the RS until it fully shuts down. see:
I assume you’re using the cloud rooms exclusively here? If so, unfortunately we don’t expose any sort of way to configure this 5 second timeout if there are no packets.
If you’re using a local RS at all, there is the parameter: --cleanup-interval-sec that you can set to a large number to prevent this cleanup from happening.
Is it possible in your setup to connect 2 clients, one that’s simply sending updates while the other is being debugged?
I use a mix of local RS and the cloud rooms. Thanks for the parameter.
I’m really not bothered by this bridge timeout in day to day development. Yes, when I exit the debugger I’m usually kicked out of the session, but I usually have whatever answer I was looking for too. Normally even with no clients connected the simulator will run for our current 30 minute window.
The reason I originally posted this is we started seeing an exception when saving the game to the database from the simulator (bottom of the log). It was preceded by these bridge timeouts that I could not explain or understand - along with our global sync objects being disabled (Flagship::OnDisable) and then enabled again as if the simulator was reinitialized…
I added some more debugging to the bridge handling and the database save function. I’m not sure what is triggering this case yet.
Thanks for the help - I’ll update if I learn anything interesting over next couple days.
When the bridge hits a timeout it disconnects the Simulator, which destroys any server side sync objects that are Session Based. I had mistakenly left some important server objects as Session Based instead of Persistent. This has not been a problem until the bridge started hitting these timeouts more often.
It seems that the session based objects are recreated on reconnect. I see Awake methods being called on them, then the old ones are destroyed. This was pretty confusing since I had the objects in memory but not the “real” original ones. I think they should just be changed to Persistent lifetime and that will make the reconnects more recoverable.
(Though it might still be desirable to prevent the bridge timeouts)
Thanks for the detailed explanation. You make a fair point, and we do want to allow simulators time to “clean up” after all players have disconnected. I’ll open an issue with the team to figure out how to more gracefully handle this.
In order to keep the simulator / bridge connection open and prevent the timeout, I was thinking I could just send an empty “keep alive” Command every few seconds. Thoughts?
I believe we purposefully don’t heed these keep alives at the moment to prevent a situation of a room simply never closing if only a simulator is connected. As a cost saving measure for both us and customers. I do agree this should be configurable by you so that you can have time to take any sort finalization action you need.
Please give it a shot, though. We have 2 timeouts: the RS connection timeout (5 seconds) and the room shutdown timeout (1 minute). The protection I describe above may only apply to the 1 minute timeout.
Ok cool - it sure looks like I’m hitting the 5 second timeout variety:
2024-11-13T 09:09:06-08:00Z SimulatorPlayStartup::OnBridgeConnectionError Coherence.Connection.ConnectionTimeoutException: Connection timeout: no valid message received in time, Timeout: 0:00:05
2024-11-13T 09:09:07-08:00Z OnClientConnectionDestroyed 25633 Simulator
I assume even with my heart beat added, the room would still be subject to the 1 minute “Close Empty Room Timeout” when no player clients are connected? We have configured to 30 minutes currently - and yes we also want that for cost savings.
Odd - my keep alive messages do not seem to prevent the bridge connection timeout. I’m sending every 2 seconds, and sending to MessageTarget.All from an object created and owned by the simulator.
Once the simulator hits this timeout it does try and seem to reconnect, but the state of the room is not right. I’m even seeing two simulator connection created log lines for some reason.
I’m out of ideas again… hmm.
2024-11-13T 11:23:02-08:00Z ServerKeepAliveRpc received
2024-11-13T 11:23:04-08:00Z ServerKeepAliveRpc received
2024-11-13T 11:23:06-08:00Z ServerKeepAliveRpc received
2024-11-13T 11:23:08-08:00Z ServerKeepAliveRpc received
2024-11-13T 11:23:10-08:00Z ServerKeepAliveRpc received
2024-11-13T 11:23:12-08:00Z SimulatorPlayStartup::OnBridgeConnectionError Coherence.Connection.ConnectionTimeoutException: Connection timeout: no valid message received in time, Timeout: 0:00:05
2024-11-13T 11:23:12-08:00Z OnClientConnectionDestroyed 63840 Simulator
2024-11-13T 11:23:12-08:00Z SimulatorPlayStartup::OnBridgeDisconnected Timeout
2024-11-13T 11:23:12-08:00Z 19:23:11.803 (coherence) AutoSimulatorConnection: Disconnected. Timeout
2024-11-13T 11:23:12-08:00Z 19:23:11.804 (coherence) AutoSimulatorConnection: Starting reconnect...
2024-11-13T 11:23:12-08:00Z 19:23:11.804 (coherence) AutoSimulatorConnection: Connecting as simulator to endpoint host: 10.13.128.37, port: 20249, region: usw, roomId: 117, worldId: 0, uniqueRoomId: 9151757, schemaId: 9f8ede61cab72dff23160c81ab5ef9e1bce54c9d slug=shadow_v61 sdkVersion=1.3.1 rsVersion=v6.5.2
2024-11-13T 11:23:12-08:00Z ServerKeepAliveRpc received
2024-11-13T 11:23:12-08:00Z 19:23:11.828 (coherence) AutoSimulatorConnection: Connection successful.
2024-11-13T 11:23:12-08:00Z OnClientConnectionCreated 9184 Simulator
2024-11-13T 11:23:12-08:00Z OnClientConnectionCreated 63840 Simulator
I’m even seeing these timeouts with 2 clients actively playing, running, and sending telemetry. This is with the simulator sending a heartbeat every second also.
Any other ideas why this might be happening - or how to debug?