Page 1 of 1

WORKAROUND Client with non-standard MTU cannot connect to Highland servers

Posted: Fri Nov 29, 2024 3:39 pm
by superguppy
The client on one of my FAH hosts could not connect to the Highland servers. The client had no issues connecting to the other FAH servers.

The MTU for the client host was set to 9,000. An MTU of 1,500 is more common. After the MTU on the host was changed to 1,500 the connections succeeded.

What is puzzling about this is that it only appeared to affect connections to the Highland servers.

The issue was reproducable at the OS level. Tools like 'openssl s_client -connect' and ssldump showed the SSL connection would get stuck after the SSL client sent the 'client hello' message. A 'server hello' message is expected in response but is never received. Doing an on-line search on this behavior suggested the MTU size can affect the SSL handshake process, which prompted a review of the network interface configuration on the host.

SSL connections to other servers from the same host work fine with the MTU at 9,000. The FAH client can connect to other FAH servers, and generic SSL connections such as from browsers from the same host work as well.

All's well that ends well of course, but it makes me wonder what is different about the Highland servers to cause this.

Below is the start of the client log, in case people are intersted in the client host configuration.

Code: Select all

00:30:29:I1: Version: 8.3.18
00:30:29:I1: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
00:30:29:I1: Org: foldingathome.org
00:30:29:I1: Copyright: 2023-2024, foldingathome.org
00:30:29:I1: Homepage: https://foldingathome.org/
00:30:29:I1: License: GPL-3.0-or-later
00:30:29:I1: URL: https://v8-3.foldingathome.org/
00:30:29:I1: Date: Jul 12 2024
00:30:29:I1: Time: 13:26:31
00:30:29:I1: Revision: 99ae953ee7b1c0b3070161cfcf9150184f76bd96
00:30:29:I1: Branch: master
00:30:29:I1: Compiler: GNU 8.3.0
00:30:29:I1: Options: -Wsuggest-override -faligned-new -std=c++17 -fsigned-char
00:30:29:I1: -ffunction-sections -fdata-sections -O3 -funroll-loops -fno-pie
00:30:29:I1: Platform: linux 4.19.0-26-cloud-amd64
00:30:29:I1: Bits: 64
00:30:29:I1: Mode: Release
00:30:29:I1: Args: --config=/etc/fah-client/config.xml
00:30:29:I1: --log=/var/log/fah-client/log.txt
00:30:29:I1: --log-rotate-dir=/var/log/fah-client/
00:30:29:I1: Config: /etc/fah-client/config.xml
00:30:29:I1:****************************** CBang ******************************
00:30:29:I1: Version: 1.7.2
00:30:29:I1: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
00:30:29:I1: Org: Cauldron Development
00:30:29:I1: Copyright: Cauldron Development, 2003-2024
00:30:29:I1: Homepage: https://cauldrondevelopment.com/
00:30:29:I1: License: LGPL-2.1-or-later
00:30:29:I1: Date: Jun 24 2024
00:30:29:I1: Time: 13:29:44
00:30:29:I1: Revision: 1b05ea96f0ed3043c32b78a66dbf50a9b2002289
00:30:29:I1: Branch: master
00:30:29:I1: Compiler: GNU 8.3.0
00:30:29:I1: Options: -Wsuggest-override -faligned-new -std=c++17 -fsigned-char
00:30:29:I1: -ffunction-sections -fdata-sections -O3 -funroll-loops -fno-pie
00:30:29:I1: -fPIC
00:30:29:I1: Platform: linux 4.19.0-26-cloud-amd64
00:30:29:I1: Bits: 64
00:30:29:I1: Mode: Release
00:30:29:I1:***************************** System ******************************
00:30:29:I1: CPU: AMD Ryzen Threadripper 1920X 12-Core Processor
00:30:29:I1: CPU ID: AuthenticAMD Family 23 Model 1 Stepping 1
00:30:29:I1: CPUs: 24
00:30:29:I1: Memory: 15.45GiB
00:30:29:I1:Free Memory: 11.87GiB
00:30:29:I1: OS Version: 6.8
00:30:29:I1:Has Battery: false
00:30:29:I1: On Battery: false
00:30:29:I1: Hostname: threadripper
00:30:29:I1: UTC Offset: -5
00:30:29:I1: PID: 9902
00:30:29:I1: CWD: /var/lib/fah-client
00:30:29:I1: Exec: /usr/bin/fah-client
00:30:29:I1:*******************************************************************
00:30:29:I2:<config>
00:30:29:I2: <!-- Server -->
00:30:29:I2: <connection-timeout v='120'/>
00:30:29:I2:
00:30:29:I2: <!-- User Information -->
00:30:29:I2: <passkey v='*****'/>
00:30:29:I2: <team v='***'/>
00:30:29:I2: <user v='***'/>
00:30:29:I2:</config>
00:30:29:I1:Opening Database
00:30:29:I1:F@H ID = *****
00:30:29:I3:Loading default group
00:30:29:I3:Loading default resource group
00:30:29:I1:Listening for HTTP on 127.0.0.1:7396
00:30:29:I3:WU2:Loading work unit 2 with ID C2s9gYppePqLsnhUuryaEb5aMf2dy4HINTlPJqv-Ko0
00:30:29:I3:Loaded 1 wus.
00:30:29:I3:gpus = {
00:30:29:I3: "gpu:07:00:00": {
00:30:29:I3: "vendor": 4318,
00:30:29:I3: "type": "nvidia",
00:30:29:I3: "description": "NVIDIA GeForce RTX 3050",
00:30:29:I3: "uuid": "b22faafa-7aaf-b078-ce58-865d0fdd1036",
00:30:29:I3: "opencl": {"platform": 0, "device": 1, "compute": "3.0", "driver": "550.120"},
00:30:29:I3: "cuda": {"platform": 0, "device": 1, "compute": "8.6", "driver": "12.4"},
00:30:29:I3: "device": 9479,
00:30:29:I3: "supported": true
00:30:29:I3: },
00:30:29:I3: "gpu:08:00:00": {
00:30:29:I3: "vendor": 4318,
00:30:29:I3: "type": "nvidia",
00:30:29:I3: "description": "NVIDIA GeForce GTX 1660 SUPER",
00:30:29:I3: "uuid": "c4bc720c-30f9-496d-8edb-fb2b4a6a5847",
00:30:29:I3: "opencl": {"platform": 0, "device": 2, "compute": "3.0", "driver": "550.120"},
00:30:29:I3: "cuda": {"platform": 0, "device": 2, "compute": "7.5", "driver": "12.4"},
00:30:29:I3: "device": 8644,
00:30:29:I3: "supported": true
00:30:29:I3: },
00:30:29:I3: "gpu:65:00:00": {
00:30:29:I3: "vendor": 4318,
00:30:29:I3: "type": "nvidia",
00:30:29:I3: "description": "NVIDIA GeForce RTX 3060 Ti",
00:30:29:I3: "uuid": "f936634c-b097-a12b-16a4-5055f81704fa",
00:30:29:I3: "opencl": {"platform": 0, "device": 0, "compute": "3.0", "driver": "550.120"},
00:30:29:I3: "cuda": {"platform": 0, "device": 0, "compute": "8.6", "driver": "12.4"},
00:30:29:I3: "device": 9236,
00:30:29:I3: "supported": true
00:30:29:I3: }
00:30:29:I3:}

Re: RESOLVED Client with non-standard MTU cannot connect to Highland servers

Posted: Fri Nov 29, 2024 6:55 pm
by Joe_H
Note, a MTU of 9000 is actually quite standard for high speed interconnects that support jumbo frames. Routers should handle re-packaging the packets into smaller ones and back. But if a router somewhere in the transmission path is misbehaving, that can cause problems.

I do know they have specifically had reports from overseas connections sometimes failing or being slow to these servers, and they have been looking into that. A number of topics and posts have been made here. However from your IP you are located in the US.

Re: RESOLVED Client with non-standard MTU cannot connect to Highland servers

Posted: Sat Nov 30, 2024 5:38 pm
by superguppy
You are correct, I reside in the US. I have updated my forum profile to reflect this.

I also wanted to share a more surgical fix for this issue. I learned it is possible to specify a cutom MTU as part of IP route entries. So I tried.

I returned the MTU of the network interface to 9,000. SSL connections in the Highland test case (see below) started to fail again. The other test cases continued to work.

Then I added a custom route for a CIDR block that covers the Highland hosts, with the lower MTU:

Code: Select all

ip route add 158.130.118.16/28 via 192.168.1.1 mtu 1500
With this route in place all test cases work! :D

The host in question uses NetworkManager, so to make the change permanent I had to run this:

Code: Select all

nmcli connection modify 'Wired connection 1' +ipv4.routes '158.130.118.16/28 192.168.1.1 mtu=1500'
The test cases I was refering to are below. I would expect all to work, regardless of the MTU. The Highland case can be made to fail at will, by removing the custom route.

Code: Select all

openssl s_client -connect highland1.seas.upenn.edu:443 -state -nbio
openssl s_client -connect assign1.foldingathome.org:443 -state -nbio
openssl s_client -connect openssl.org:443 -state -nbio

Re: RESOLVED Client with non-standard MTU cannot connect to Highland servers

Posted: Sun Dec 01, 2024 3:20 pm
by muziqaz
If I remember correctly there was some talk regarding MTU size from the server owners/researcher. This is not an accident, and it is set on purpose. Why was the MTU set to 9000 on the host? Was it left to automatic, and it decided to set to the same as the server?

Re: RESOLVED Client with non-standard MTU cannot connect to Highland servers

Posted: Sun Dec 01, 2024 5:53 pm
by superguppy
It was mentioned previously on the thread:
Joe_H wrote: Fri Nov 29, 2024 6:55 pm Note, a MTU of 9000 is actually quite standard for high speed interconnects that support jumbo frames. Routers should handle re-packaging the packets into smaller ones and back. But if a router somewhere in the transmission path is misbehaving, that can cause problems.
In this case the affected host has the MTU set to 9,000 to maximize performance of its access to network storage.

The affected host has been running the FAH client with the MTU set to 9,000 for two years at least. The FAH client started experiencing connectivity issues only recently (two weeks or so?) and only with the Highland servers. It turns out this was because the SSL negotiation would hang. The host has no issues making SSL connections to any other hosts, including other FAH servers.

As mentioned in the quote, "the network" should transparently convert packages between the different MTU sizes. That is, this host with MTU set to 9,000 can communicate with any other host on the Internet no matter what that host's MTU is.

IOW, the Highland servers can have their MTU size set to whatever they think is best. The network should handle this transparently as it does for any other host this host connects to, but for some reason it does not for the Highland servers.

Re: RESOLVED Client with non-standard MTU cannot connect to Highland servers

Posted: Sun Dec 01, 2024 6:30 pm
by toTOW
I know that network settings have been tuned a few weeks of on highland servers to try to solve some low upload and packet drop issues, so the new settings might not compatible with such a high MTU ... I asked the researcher in charge of these servers to have a look at this thread.

See the thread just below yours : viewtopic.php?t=42167

Re: RESOLVED Client with non-standard MTU cannot connect to Highland servers

Posted: Sun Dec 01, 2024 9:18 pm
by superguppy
toTOW wrote: Sun Dec 01, 2024 6:30 pm I know that network settings have been tuned a few weeks of on highland servers to try to solve some low upload and packet drop issues, so the new settings might not compatible with such a high MTU ... I asked the researcher in charge of these servers to have a look at this thread.

See the thread just below yours : viewtopic.php?t=42167
Thanks. I did see that thread but I decided not to add my issue to it because it presents differently. Specifically, in that thread the connection times out after one minute and the FAH client continues. In my case the connection hangs, causing the FAH client to get stuck.

Also, let me emphasize the MTU size on the client or server host should not matter. The network should convert between different MTU sizes on the fly, and it does for every other connection from this host.

This is not about specific MTU sizes on each side of the connection, the connection should work no matter what they are. It is about why the larger MTU size on the client causes the SSL handshake to hang.

Re: RESOLVED Client with non-standard MTU cannot connect to Highland servers

Posted: Mon Dec 02, 2024 4:47 pm
by jjmiller
Thank you! I've passed this along to our network guru :).