
Thanks in advance.
Moderators: Site Moderators, FAHC Science Team
viewtopic.php?p=86874#p86874MtM wrote:Stil can't find the work server ( 128.59.74.4 )
mrshirts wrote:Update on 128.59.74.4:
The good news, all the data (2 TB) is safe. I was able to rebuild and mount the raid. The bad news is, the server won't boot normally. Since it's actually at Columbia (where I don't work anymore), I'm a bit at the mercy of the IT support staff there in terms of getting it up and running again. The current plan is therefore to copy the data off that is needed to continue the projects, and try to relay the IP to a different machine, putting it into accept only mode. Time line is probably going to be about a week, unfortunately.
Code: Select all
02:18:08] Project: 3855 (Run 628, Clone 8, Gen 27)
[02:18:08] + Attempting to send results [March 13 02:18:08 UTC]
[02:18:08] Working on queue slot 09 [March 13 02:18:08 UTC]
[02:18:08] + Working ...
[02:18:09]
[02:18:09] *------------------------------*
[02:18:09] Folding@Home Gromacs 3.3 Core
[02:18:09] Version 1.93 (July 23, 2008)
[02:18:09]
[02:18:09] Preparing to commence simulation
[02:18:09] - Looking at optimizations...
[02:18:09] - Files status OK
[02:18:10] - Expanded 1162827 -> 6173133 (decompressed 530.8 percent)
[02:18:10]
[02:18:10] Project: 5113 (Run 98, Clone 18, Gen 24)
[02:18:10]
[02:18:11] Assembly optimizations on if available.
[02:18:11] Entering M.D.
[02:18:17] FAH Init
[02:18:17] Checkpoint file:
[02:18:22] (Starting from checkpoint)
[02:18:22] Read checkpoint
[02:18:22] Protein: Calmodulin in water
[02:18:22] Writing local files
[02:18:24] Completed 31867 out of 500000 steps (6 percent)
[02:18:25] Extra SSE boost OK.
[02:18:29] - Couldn't send HTTP request to server
[02:18:29] + Could not connect to Work Server (results)
[02:18:29] (128.59.74.4:8080)
[02:18:29] + Retrying using alternative port
[02:18:50] - Couldn't send HTTP request to server
[02:18:50] + Could not connect to Work Server (results)
[02:18:50] (128.59.74.4:80)
[02:18:50] - Error: Could not transmit unit 02 (completed February 24) to work server.
[02:18:50] + Attempting to send results [March 13 02:18:50 UTC]
[02:21:59] - Couldn't send HTTP request to server
[02:21:59] + Could not connect to Work Server (results)
[02:21:59] (171.65.103.100:8080)
[02:21:59] + Retrying using alternative port
[02:25:09] - Couldn't send HTTP request to server
[02:25:09] + Could not connect to Work Server (results)
[02:25:09] (171.65.103.100:80)
[02:25:09] Could not transmit unit 02 to Collection server; keeping in queue.
[02:42:12] Writing local files
[02:42:12] Completed 35000 out of 500000 steps (7 percent)
The CS will only accept WUs it knows about. If the WS went down before sending the WU info to the CS, the CS won't accept the WU. If this is the case, nothing can be done until the WS is back online and accepting.MtM wrote:He should still be able to upload since the collection server was at the time of my last post online and accepting.
But then the server would return 'unkown work unit' or 'the server has no record of this unit' and I didn't see that in his log, just 'unable to connect to ..'ChelseaOilman wrote:The CS will only accept WUs it knows about. If the WS went down before sending the WU info to the CS, the CS won't accept the WU. If this is the case, nothing can be done until the WS is back online and accepting.MtM wrote:He should still be able to upload since the collection server was at the time of my last post online and accepting.