AgrFan wrote:The big questions are a) why is this server low on work, b) why does work not get uploaded to a collection server when this server goes down, and c) why can't a temporary fix be implemented in the low-level HTTP code/library to stop the binaries from hanging when this server gets overloaded?
All servers make new WUs from the results that are returned. If a server is overloaded, it may be able to assign all the WUs (small data transfers) while not being able to accept the large data transfers associated with uploads. Also, projects do end. Then it takes human thought to learn what an old project told them and devise a project to answer the new questions that were discovered. Then new projects must be prepared and tested before they can be distributed. When an individual server is overloaded or down, there usually is redundancy provided by assigning work from other servers.
The collection servers are running at maximum capacity. I'm not sure when new server capacity will come on-line or how it will be allocated.
Vijay said "
likely in the low-level HTTP code/library" which means they have not been able to fully identify why the binary hangs so they also don't know exactly what to fix. Older versions of Linux contain bugs that are fixed in newer versions so the best approach is to upgrade to a new version. (If anyone knows how to do the suggested temporary fix, let us know.)
BTW, the server appears to be running fine right now with a reasonably light load.