Tuning MongoDB & Linux to allow for tens of thousands connections

Henrik Ingo

My manager at a previous employer once described Red Hat (RHEL) as "optimized for consulting". (By implication this includes Centos and Amazon Linux too.) His point was that for whatever reason, RHEL ships with default ulimit and other configurations that are appropriate for your laptop, and to really get the full performance of a large production server you need to do a lot of tuning to increase various limits and buffers. This creates a lucrative market for consultants that know all the knobs that need turning.

Recently we wanted to benchmark how MongoDB behaves with a large number of connections. This caused me to have to revisit this topic and refresh my memory on how to create a large number of connections and threads on a Linux server. In the process I found some new tunables I had not used the last time I did this.

MongoDB configuration

Even MongoDB itself has an option to limit the maximum number of incoming connections. It defaults to 64k.

# mongod.conf
net:
  maxIncomingConnections: 999999

Note that by default MongoDB creates a dedicated worker thread for each incoming connection. I wanted to test this default, but I should point out there's a related setting that changes to a worker pool model. Presumably this allows a larger number of incoming connections and using less threads. Note that this option is still labeled experimental, even if it is officially documented:

net:
  serviceExecutor: adaptive

But for my tests I will create a thread per connection, so all of the below configuration was also needed...

Linux configuration

To set the ulimits correctly, I needed to go back and remember all the basic Unix principles I learned in college:

  • Everything is a file. In particular, TCP/IP connections are open files as far as ulimit is concerned.
  • For historical reasons, nproc is really the number of threads. Historically a Linux process was a single thread and concurrent workloads were multi-process.
  • Threads allocate memory from the stack, which also has a maximum size.
# Connections are files because in Unix everything is a file.
echo "ec2-user           soft    nofile          9999999" | sudo tee -a /etc/security/limits.conf
echo "ec2-user           hard    nofile          9999999" | sudo tee -a /etc/security/limits.conf
# nproc is really number of threads.
echo "ec2-user           soft    nproc           9999999" | sudo tee -a /etc/security/limits.conf
echo "ec2-user           hard    nproc           9999999" | sudo tee -a /etc/security/limits.conf
# Threads need memory from the stack.
echo "ec2-user           soft    stack           9999999" | sudo tee -a /etc/security/limits.conf
echo "ec2-user           hard    stack           9999999" | sudo tee -a /etc/security/limits.conf

For more info, see the MongoDB manual page on ulimit settings.

But wait, there's more! Creating threads uses mmap to allocate memory from stack. And on the kernel level there's a setting for max number of mmapped memory blocks per process, which must be increased too:

echo 9999999 > /proc/sys/vm/max_map_count
# If you want to persist across reboots
echo "vm.max_map_count=9999999" | sudo tee -a /etc/sysctl.conf

Finally, on the benchmark client, I started to hit limitations with TCP/IP. In the TCP protocol, one socket is identified with the tuple (local address, local port, remote address, remote port) and this tuple must be unique per socket. The port numbers range from 1 to 65535. So from a single benchmark client, I can only create 65535 outgoing connections. To benchmark more connections than this, the only alternative is to have more than one client host, or at least more than one IP address for the client. But I didn't go that far.

(On the server side the port is of course the well known mongod port 27017.)

I was surprised to learn that by default Linux wouldn't even use the full range of 65k ports that TCP has possible. Even this had to be configured:

echo 1024 65530 > /proc/sys/net/ipv4/ip_local_port_range
# If you want to persist across reboots
echo "net.ipv4.ip_local_port_range = 1024 65530" | sudo tee -a /etc/sysctl.conf

The two numbers are the min and max outgoing ports. Note that this configuration is NOT necessary on a server, just the benchmark client.

EC2 configuration

On AWS I found that on the M5 family of EC2 instances that I tried - up to m5.2xlarge - I was only ever able to create 32k connections and threads. With the exact same configuration, but switching to c3.8xlarge instance type, I was able to create more than that, reaching the near-65k limit dictated by ip_local_port_range above.

I haven't found any AWS documentation that would confirm my observation about M5 instances. Nor did AWS support confirm it. So it could still be an error on my side.

Summary

So here are all the steps in one copy and pasteable script. This is purpose built for AWS instances running Amazon Linux 2. You may need to adjust for other versions of Linux. In particular, on Centos and RHEL change the username from ec2-user to root.

# This assumes a fresh Linux host from standard Amazon Linux 2 images.
# Adaptable to Centos/RHEL too.

sudo su

sed -i .orig 's/net\:/net\:\n  maxIncomingConnections: 999999/' /etc/mongod.conf

# Connections are files because in Unix everything is a file.
echo "ec2-user           soft    nofile          9999999" | sudo tee -a /etc/security/limits.conf
echo "ec2-user           hard    nofile          9999999" | sudo tee -a /etc/security/limits.conf
# nproc is really number of threads.
echo "ec2-user           soft    nproc           9999999" | sudo tee -a /etc/security/limits.conf
echo "ec2-user           hard    nproc           9999999" | sudo tee -a /etc/security/limits.conf
# Threads need memory from the stack.
echo "ec2-user           soft    stack           9999999" | sudo tee -a /etc/security/limits.conf
echo "ec2-user           hard    stack           9999999" | sudo tee -a /etc/security/limits.conf

# Threads allocate memory with mmap
echo 9999999 > /proc/sys/vm/max_map_count
# If you want to persist across reboots
echo "vm.max_map_count=9999999" | sudo tee -a /etc/sysctl.conf

# Needed for outgoing connections (on client)
echo 1024 65530 > /proc/sys/net/ipv4/ip_local_port_range
echo "net.ipv4.ip_local_port_range = 1024 65530" | sudo tee -a /etc/sysctl.conf

# Checks EC2 instance type but doesn't do anything about it
curl http://169.254.169.254/latest/meta-data/instance-type