10031 (Run 14, Clone 0, Gen 39) Protomol
Posted: Sat Sep 04, 2010 9:34 pm
I am quite new to Folding Home, so perhaps my question has been raised before....
Today I had a power faillure and needed to restart my system, which is running 24/7 just to contribute to science. However, I noticed that my work wasn't continuing because of an error, I believe it is known already. So my question is, how to avoid this from happening again - besides no power faillures ofcourse? Is the work I had done (68%) completely lost? Is there a way to restart at a previous checkpoint? It's pitty that even the partial results weren't able to upload either....
Thanks in advance for your answer.
Below is my log:
Today I had a power faillure and needed to restart my system, which is running 24/7 just to contribute to science. However, I noticed that my work wasn't continuing because of an error, I believe it is known already. So my question is, how to avoid this from happening again - besides no power faillures ofcourse? Is the work I had done (68%) completely lost? Is there a way to restart at a previous checkpoint? It's pitty that even the partial results weren't able to upload either....
Thanks in advance for your answer.
Below is my log:
Code: Select all
[12:49:44] Loaded queue successfully.
[12:49:44]
[12:49:44] + Processing work unit
[12:49:44] Core required: FahCore_b4.exe
[12:49:44] Core found.
[12:49:44] Working on queue slot 03 [September 5 12:49:44 UTC]
[12:49:44] + Working ...
[12:49:56] *********************** Log Started 05/Sep/2010 12:49:55 ***********************
[12:49:56] ************************** ProtoMol Folding@Home Core **************************
[12:49:56] Version: 25
[12:49:56] Type: 180
[12:49:56] Core: ProtoMol
[12:49:56] Website: http://folding.stanford.edu/
[12:49:56] Copyright: (c) 2009 Stanford University
[12:49:56] Author: Joseph Coffland <joseph@cauldrondevelopment.com>
[12:49:56] Args: -dir work/ -suffix 03 -cpu 90 -checkpoint 15 -service -lifeline 972
[12:49:56] -version 623
[12:49:56] ************************************ Build *************************************
[12:49:56] Date: May 18 2010
[12:49:56] Time: 23:43:52
[12:49:56] Revision: 1819
[12:49:56] Compiler: Intel(R) C++ MSVC 1500 mode 1110
[12:49:56] Options: /TP /nologo /EHsc /wd4297 /wd4103 /wd1786 /arch:IA32 /Ox
[12:49:56] /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qrestrict /MT
[12:49:56] Defines: _CRT_SECURE_NO_WARNINGS NDEBUG HAVE_GEEKINFO BOOST_ALL_NO_LIB
[12:49:56] XML_STATIC HAVE_EXPAT HAVE_OPENSSL HAVE_LIBFAH HAVE_SIMTK_LAPACK
[12:49:56] Platform: Windows XP
[12:49:56] Bits: 32
[12:49:56] Mode: Release
[12:49:56] ************************************ System ************************************
[12:49:56] OS: Microsoft Windows XP Professional
[12:49:56] CPU: AMD Sempron(tm) Processor 2800+
[12:49:56] CPU ID: AuthenticAMD Family 15 Model 44 Stepping 2
[12:49:56] CPUs: 1 Logical, 1 Physical
[12:49:56] Memory: 2.00 GB
[12:49:56] Threads: Windows
[12:49:56] ********************************************************************************
[12:49:56] Project: 10031 (Run 14, Clone 0, Gen 39)
[12:49:56] Unit: 0x000000420001329c4bd49ced0000ea7d
[12:49:56] User: 0x00000000000000000000000000000000
[12:49:56] Machine: 1
[12:49:56] Digital signatures verified
[12:50:05] Completed 341900 out of 499375 steps (68%)
[12:52:56] ERROR: ProtoMol ERROR: Corrupt DCD file. Size is 3275268, should be >= 3281652.
[12:52:56] Saving result file logfile_03.txt
[12:52:56] Saving result file checkpt
[12:52:56] Saving result file checkpt.crc
[12:52:56] Saving result file log.txt
[12:53:05] Saving result file protomol.conf
[12:53:05] Saving result file ww.3839.pos
[12:53:05] Saving result file ww.3839.vel
[12:53:05] Saving result file ww.dcd
[12:53:07] WARNING: While cleaning up: 0: Failed to remove directory '03': boost::filesystem::remove: The process cannot access the file because it is being used by another process: "03\ww.dcd"
[12:53:07] Folding@home Core Shutdown: BAD_WORK_UNIT
[12:53:09] CoreStatus = 72 (114)
[12:53:09] Sending work to server
[12:53:09] Project: 10031 (Run 14, Clone 0, Gen 39)
[12:53:09] + Attempting to send results [September 5 12:53:09 UTC]
[12:56:42] - Couldn't send HTTP request to server
[12:56:42] + Could not connect to Work Server (results)
[12:56:42] (129.74.85.15:8080)
[12:56:42] + Retrying using alternative port
[13:00:15] - Couldn't send HTTP request to server
[13:00:15] + Could not connect to Work Server (results)
[13:00:15] (129.74.85.15:80)
[13:00:15] - Error: Could not transmit unit 03 (completed September 5) to work server.
[13:00:15] Keeping unit 03 in queue.
[13:00:15] Project: 10031 (Run 14, Clone 0, Gen 39)
[13:00:15] + Attempting to send results [September 5 13:00:15 UTC]
[13:03:48] - Couldn't send HTTP request to server
[13:03:48] + Could not connect to Work Server (results)
[13:03:48] (129.74.85.15:8080)
[13:03:48] + Retrying using alternative port
[13:04:09] - Couldn't send HTTP request to server
[13:04:09] + Could not connect to Work Server (results)
[13:04:09] (129.74.85.15:80)
[13:04:09] - Error: Could not transmit unit 03 (completed September 5) to work server.
[13:04:09] + Attempting to send results [September 5 13:04:09 UTC]
[13:04:40] - Couldn't send HTTP request to server
[13:04:40] + Could not connect to Work Server (results)
[13:04:40] (129.74.85.16:8080)
[13:04:40] + Retrying using alternative port
[13:04:50] - Couldn't send HTTP request to server
[13:04:50] + Could not connect to Work Server (results)
[13:04:50] (129.74.85.16:80)
[13:04:50] Could not transmit unit 03 to Collection server; keeping in queue.
[13:04:50] - Preparing to get new work unit...