Netgames 2003
- network traces of a popular CS server for a week in April 2002
- 16k user sessions recorded
- 99% of players play less than 2 hours
- play session follows a Weibull distribution with k = 0.5 and λ = 20 (shape similar to 1/x exp(-x))
- For play sessions from 10 to 100 minutes, the chance of disconnecting (ie
failure rate
) remains constant at 2.5%.
- For play sessions shorter than 10 minutes, 10% chance of disconnecting. Possible reasons: connection problems, kicked out or leave because of server rules (such as friendly fire allowed, but kicked out if you kill your team-mates too often)
- Assumptions: independent clocks with no synchronization mechanism, players react to server updates, updates only consist of creation and/or removal of object(s) (and NOT object position updates)
- Users have reaction time to act in response to server update messages. Ignore latency induced by network and only compare user reaction times to determine which update to actually run on the world state.
the Fair-Ordering Service [...] dynamically enforces a sufficient waiting period on each action message to guarantee the fair processing of all action messages.
But practically, the waiting period is bounded to ensure a relative level of interactivity.
- Proxies are game-agnostic and located near players (ie low latency between a player and her proxy). Proxy receives action message from user, then forwards that action message with a message identification number (to deliver messages in order) and the reaction time to the game server.
- Causality control preserves the order of events of game data (keyboard inputs). No need for causality in voice or video
- Media synchronization control = intra-stream (temporal relation between MU such as voice or video packets) + inter-stream (timing among multiple streams) + group (timing among multiple end-points to ensure fairness) synchronization controls
- Compare C-S to P2P architectures in terms of success of the 4 previously mentioned control schemes. Voice and video don't need to go through the server (they're sent in P2P mode in both scenarios).
- Adaptive Δ-causality control used on game data in both scenarios: the recipient considers a packet still valid Δ = 50 ms after its generation timestamp. [That means the latency automatically increases by Δ ms for all packets].
Adaptive
means that the value of Δ changes based on the network load. Smaller Δ = game more interactive, large Δ = less packets are discarded for being late/misordered. Unfairness appears when terminals have different Δ, hence need group sync control.
- Piggy-back an MU on the succeeding k=4 MUs to recover from lost UDP packets
- Experiment: two terminals in both C-S and P2P scenarios [only two?!]. Terminal 1 is connected to an overloaded hub with delay jitter, Terminal 2 is connected to its own hub. Connections are 10 Mbps ethernet. Server connected to T2's hub. Additional delay of 100 ms introduced between the two terminals by a data link simulator between T1's hub and T2's hub. Game MUs = 20 Bytes, sent 10 times per second, while voice MUs = 400 Bytes, sent 20 times per sec, and video MUs = 5kB, sent 20 times per sec [hence most of the load on the network comes from voice and audio, not game data]. Experiment ran for 2 minutes.
- For heavy loads (8Mbps), C-S is better for causality, but worse for consistency, fairness, and interactivity.
- Compare C-S, P2P and PP-CA (= P2P with central authority/arbiter receiving moves from all players and notifying them when it detects inconsistencies)
- Tu = Duration of client loop, Lu = size of update messages
- CS: client upstream = Lu/Tu, client downstream = N.Lu/Tu, server downstream = N.Lu/Tu, server upstream = N(N.Lu)/Tu
- P2P: client upstream = client downstream = (N-1)Lu/Tu
- PP-CA: client upstream = N.Lu/Tu, client downstream = (N-1).Lu/Tu + f.N.Lu/Tu with f = ratio of inconsistencies to be corrected, arbiter downstream = N.Lu/Tu, arbiter upstream = f.N(N.Lu)/Tu
- Look at delays introduced from access networks (aka last mile links), not from back-bone. Goal: how to dimension the network to reach minimum delay possible.
- Network delay can be caused by propagation (mostly only in the case of back-bones though: 5µs/km), serialization (putting all the bits of a packet on the link), packet processing (route and DNS lookups, error correction), and queuing (other packets have to be treated before; differs from packet to packet, hence jitter, defined as 95% percentile RTT - 5% percentile RTT). AND = minimal RTT (packet processing delay) + S (packet size) / Reff (effective link rate) + Tque (total queuing delay in up- and downstream, results in jitter)
- Experiment: for 5 different values of S, throw 100 pings. Get RTT and jitter (= Tque) from 100 pings. Obtain Reff from taking the inverse of the best-fitting trend-line through the 5 points (S, average RTT). Obtain min RTT from the intercept of the trend-line through the 5 points (S, top-1% RTT).
- QoS improves RTT by separating game traffic from other traffics
- Assumption: ad-hoc networks are going to multiply, but C-S and P2P architectures are not well-suited for them. Most interesting part of the job is to determine which device can/should be Zone server.
Even the nodes that do not play the game assist the other nodes in delivering data
- Nodes are mobile: they do not always stay within reach of the same other nodes. Discovery of Zone servers is done through the SLP v2. When latency to a zone server gets too high, a client can pick another zone server, which in turn notifies all the other zone servers of its new connection. When a client does not reply for a while, a zone server can drop it.
- Middleware as transparent as possible to the game developer. This middleware sits on top of an existing grid infrastructure from IBM called Globus. Globus decides when to spawn a new game server instance based on current resources and demands.
- Player services are in charge of authentication, account handling, chat rooms, locating games/selecting a server taking into account player preferences (e.g. team or region), and actually playing the game.
- Publisher services deal with software deployment and updates, billing, monitoring server performance, service level agreement (e.g. no more than 5% of players suffer from more than 100ms delay)
- System services include resource management and directory services. These services are accessed by the grid provider.
- Clients submit jobs using a format containing the executable, its arguments, and resource requirements. Jobs can require spawning instances at different grid locations (e.g. different regions).
- Various services such as resource informations and information providers (CPU, OS, RAM, connectivity, ...) are indexed in an LDAP. Game-specific services (tracking player stats, server load, ...) could also be added on top of existing services.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.