Page 1 of 1

FAHViewer - odd connection issue to client

Posted: Mon Jun 17, 2019 2:26 pm
by NuovaApe
I can't get FAHViewer to work so I took a network trace on 127.0.0.1 and saw FAHV connect to the FAHClient ok, but FAHV immediately disconnected without waiting for any response from the client:

In English:

1. Viewer connects to FAHClient ok
2. Viewer disconnects from Client about 1/6000th of a second later
3. An in-flight message "Welcome to the Folding@home Client command server" is received from FAHClient
4. That's All Folks.

In technobabble network trace speak:

This is FAHV connecting...
13:41:18.085393 49780 → 36330 [SYN] Seq=0 Win=64240 Len=0 MSS=65495 WS=256 SACK_PERM=1
This is FAHC acknowledging the connection...
13:41:18.085424 36330 → 49780 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=65495 WS=256 SACK_PERM=1
13:41:18.085551 49780 → 36330 [ACK] Seq=1 Ack=1 Win=525568 Len=0

This is FAHV disconnecting...
13:41:18.085723 49780 → 36330 [FIN, ACK] Seq=1 Ack=1 Win=525568 Len=0
13:41:18.085737 36330 → 49780 [ACK] Seq=1 Ack=2 Win=525568 Len=0

FAHV didn't even wait for the welcome response...
13:41:18.188657 36330 → 49780 [PSH, ACK] Seq=1 Ack=2 Win=525568 Len=60 "Welcome to the Folding@home Client command server"
13:41:18.188869 49780 → 36330 [RST, ACK] Seq=2 Ack=61 Win=0 Len=0 RST=Connection Reset aka winsock error 10053

2 guesses as to what is wrong with the Viewer:

1. The receive timeout is wrong
2. It's a non-blocking socket being read when no data is available to read. On Windows recv() will return SOCKET_ERROR (-1) to flag a status code. You need to call WSAGetLastError() to get the status code - if it returns WSAEWOULDBLOCK (10035) then it just means no data to read yet and try again later. -1 from recv() is not always a fatal error. On Linux check errno for EWOULDBLOCK or EAGAIN (I just googled Linux - I'm not a know-it-all).

I just found the source code (Client.cpp):

setBlocking(false);

So it is a non-blocking socket.

Code: Select all

int count = read();

if (count <= 0) {
      if (count < 0 || lastData + 20 < Time::now()) reconnect();
      return false;
}
The above could potentially fail forever if the client has a tiny delay in sending something immediately after connection, and the viewer is too impatient to wait just a tiny bit for it's first read() to succeed.

Windows 10 64-bit Intel i7 6700k.

Re: FAHViewer - odd connection issue to client

Posted: Mon Jun 17, 2019 10:47 pm
by MeeLee
Thread below explains, that the viewer is buggy, and no resources to fix it.
Priorities are set to make FAHClient and Control work well.
Viewer is lower priority.

viewtopic.php?f=16&t=31531&hilit=viewer&start=15#p307210

Re: FAHViewer - odd connection issue to client

Posted: Mon Jun 17, 2019 11:19 pm
by bruce
@ NuovaApe
That's useful information. I've passed it on to Development.

Until today, the only information that Development had were reports more or less like "The viewer doesn't work" which takes a bit of debugging to figure out what part was broken. Your traces made it very clear which aspect of the code to look at first. Thanks for the report.

Re: FAHViewer - odd connection issue to client

Posted: Tue Jun 18, 2019 6:51 am
by NuovaApe
MeeLee wrote:Thread below explains, that the viewer is buggy, and no resources to fix it.
Priorities are set to make FAHClient and Control work well.
Viewer is lower priority.
viewtopic.php?f=16&t=31531&hilit=viewer&start=15#p307210
Yes I've seen plenty along those lines, and totally agree the science is a priority.

Visual feedback in the digital age keeps us engaged, whether it's SETI signal spikes or a 3D protein (which is rather cool - a picture is worth 1k words).

A slowly creeping progress bar don't really do it for me. I just want to take a peek now and then for the wow/cool factor - I don't want to affect my team ranking too adversely.

Bet you'd rather have 1m hardcore users + 500k viewer users than just the 1m.

NASA is very good with public relations via impressive pictures. They are funded by the people so it's good to keep them engaged and on-board.

In the same light I think if FAH had that wow factor more would be brought into the fold...

Re: FAHViewer - odd connection issue to client

Posted: Tue Jun 18, 2019 7:01 am
by NuovaApe
bruce wrote:@ NuovaApe
That's useful information. I've passed it on to Development.
Until today, the only information that Development had were reports more or less like "The viewer doesn't work" which takes a bit of debugging to figure out what part was broken. Your traces made it very clear which aspect of the code to look at first. Thanks for the report.
I don't have a build environment to test my hypothesis, but I've done enough socket development to have been-there-seen-it-done-it many years ago. Live and learn.

As a dev I know how frustrating useless error reports are.

Users can also be the bug, to which we assign the error code "ID ten T", or, in short "ID10T".

Re: FAHViewer - odd connection issue to client

Posted: Thu Jun 20, 2019 8:31 am
by NuovaApe
I saw a mention on github related to this and they stated they cater for the EWOULDBLOCK scenario.

So something else then.

Maybe if the socket is checked for writeability before the connection handshake is complete this fails?

void Client::checkConnect() {
try {
SocketSet socketSet;
socketSet.add(*this, SocketSet::WRITE | SocketSet::EXCEPT);
socketSet.select(0);

if (!socketSet.isSet(*this, SocketSet::EXCEPT)) {
if (socketSet.isSet(*this, SocketSet::WRITE)) {
lastData = Time::now();
sendCommands(command);
state = STATE_HEADER;
}

return;

}
} CLIENT_CATCH_ERROR;

// Some error occured
reconnect();
}

Re: FAHViewer - odd connection issue to client

Posted: Thu Jun 20, 2019 4:49 pm
by bruce
Please redirect this technical discussion to the on-line ticket for this issue.

Re: FAHViewer - odd connection issue to client

Posted: Wed Aug 28, 2019 7:17 am
by Electricz0
This issue seems to be very old and common. I've experienced the error myself. Unfortunately what often attracts people to this project in the first place is the protein viewer. Hopefully, it does get fixed.