I am running the network and/or gameservers for many LAN parties. Once one of the team members told me that some CPUs are to slow to run a CSGO server at 128 ticks per second. Verifying that this happens was not that easy because back then he only had a couple of commands he could execute on the command line of the client. This means that he could only verify that the server is running fast enough while actually being on the server. This is not possible during matches.
Somewhen I stumbled across srcds_perfmon. I actually never used the software. But it has shown me the
stats command for SRCDS. This command prints out several statistics, e.g. the tickrate of the server, network traffic, number of maps played and several others. Sadly the output is not consistent between different kinds of servers. CSGO for example outputs
CPU NetIn NetOut Uptime Maps FPS Players Svms +-ms ~tick 10.0 0.0 0.0 8967 0 63.80 0 5.22 0.25 0.05
while L4D2 does not have all of those fields. An other issue is that those fields are not documented. Here is a annotated list which I made from observation, tests and guesses:
- CPU - unknown
- NetIn - inbound network traffic in kbit/s
- NetOut - outbound network traffic in kbit/s
- Uptime - uptime of the server in minutes
- Maps - number of maps played
- FPS - the tickrate of the server
- Players - the number of real players, bots are not counted
- Svms - unknown
- +-ms - unknown
- ~tick - unknown
If you find mistakes in this list feel free to contact me.
Now to the monitoring part. Since Prometheus is used on many of the events I am helping I wrote an exporter for prometheus. Connecting to a gameserver is often solved via RCON which allows you to execute commands on the gameserver and retrieve the output. I am using the python library aiorcon for this. I am not only executing the
stats command but also the
status returns some further information not included in
stats e.g. the servers name, the amount of bots and the maximum amount of players allowed on the server. The name of the server is used as a label and might be used in dashboards.
To handle the webserver part which is facing to prometheus I used aiohttp, an asynchronous webserver for python. The whole exporter completely follow the recommendations for prometheus exporters. It does not have to run once per gameserver on the same machine. It can be run on any server because it makes no difference where it is running from the query perspective. Also on some events the gameserver people don't know how to handle such an exporter. Therefore just giving the monitoring guy the rcon password and address is sufficient. Also you can easily monitor servers on which you have no direct access, e.g. when the server is run by some hoster which does not allow you to execute random code on his servers. One exporter can therefore query many servers.
Putting the password into the exporter requires a little trick with the relabeling configs of prometheus. Instead of defining targets as
<addr>:<port> they are defined as
<addr>:<port>:<rcon_pw>. The password is generated via a regex from the given target specification. Then the target specification is rewritten to remove the
The full relabeling config and the source code can be found in the github repository. I later realized that someone build a similar exporter before which sadly has the same name and does only use the
status command and does therefore not have the performance metric support.