Page 1 of 1

HELP! Did I just lose 5 days of folding?

Posted: Mon Jul 09, 2012 8:10 am
by RMouse
Help! I had a project up to 92% when it disappeared on me after I unpaused my client. Then I found a new project at 0% in it's place. I am not sure what is going on here. Can someone please advise what happened and how I can prevent it from happening again? Thank you. Here is the log file:

Code: Select all

08:05:24:FS00:Unpaused
08:05:24:FS01:Unpaused
08:05:25:WU02:FS00:Starting
08:05:25:WU02:FS00:Running FahCore: E:\FAHClient/FAHCoreWrapper.exe "C:/Documents and Settings/name/Application Data/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/NVIDIA/G80/Core_11.fah/FahCore_11.exe" -dir 02 -suffix 01 -version 701 -lifeline 248 -checkpoint 15 -gpu 0
08:05:25:WU02:FS00:Started FahCore on PID 4056
08:05:25:WU02:FS00:Core PID:2272
08:05:25:WU02:FS00:FahCore 0x11 started
08:05:25:WU00:FS01:Starting
08:05:25:WU00:FS01:Running FahCore: E:\FAHClient/FAHCoreWrapper.exe "C:/Documents and Settings/name/Application Data/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/Core_a4.fah/FahCore_a4.exe" -dir 00 -suffix 01 -version 701 -lifeline 248 -checkpoint 15 -np 2
08:05:25:WU00:FS01:Started FahCore on PID 3740
08:05:25:WU02:FS00:0x11:
08:05:25:WU02:FS00:0x11:*------------------------------*
08:05:25:WU02:FS00:0x11:Folding@Home GPU Core
08:05:25:WU02:FS00:0x11:Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
08:05:25:WU02:FS00:0x11:
08:05:25:WU02:FS00:0x11:Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
08:05:25:WU02:FS00:0x11:Build host: amoeba
08:05:25:WU02:FS00:0x11:Board Type: Nvidia
08:05:25:WU02:FS00:0x11:Core      : 
08:05:25:WU02:FS00:0x11:Preparing to commence simulation
08:05:25:WU02:FS00:0x11:- Looking at optimizations...
08:05:25:WU02:FS00:0x11:- Files status OK
08:05:25:WU02:FS00:0x11:- Expanded 62974 -> 336763 (decompressed 534.7 percent)
08:05:25:WU02:FS00:0x11:Called DecompressByteArray: compressed_data_size=62974 data_size=336763, decompressed_data_size=336763 diff=0
08:05:25:WU02:FS00:0x11:- Digital signature verified
08:05:25:WU02:FS00:0x11:
08:05:25:WU02:FS00:0x11:Project: 10502 (Run 489, Clone 0, Gen 427)
08:05:25:WU02:FS00:0x11:
08:05:25:WU02:FS00:0x11:Assembly optimizations on if available.
08:05:25:WU02:FS00:0x11:Entering M.D.
08:05:26:WU00:FS01:Core PID:3648
08:05:26:WU00:FS01:FahCore 0xa4 started
08:05:26:WU00:FS01:0xa4:
08:05:26:WU00:FS01:0xa4:*------------------------------*
08:05:26:WU00:FS01:0xa4:Folding@Home Gromacs GB Core
08:05:26:WU00:FS01:0xa4:Version 2.27 (Dec. 15, 2010)
08:05:26:WU00:FS01:0xa4:
08:05:26:WU00:FS01:0xa4:Preparing to commence simulation
08:05:26:WU00:FS01:0xa4:- Looking at optimizations...
08:05:26:WU00:FS01:0xa4:- Files status OK
08:05:26:WU00:FS01:0xa4:- Expanded 2079246 -> 5386224 (decompressed 259.0 percent)
08:05:26:WU00:FS01:0xa4:Called DecompressByteArray: compressed_data_size=2079246 data_size=5386224, decompressed_data_size=5386224 diff=0
08:05:26:WU00:FS01:0xa4:- Digital signature verified
08:05:26:WU00:FS01:0xa4:
08:05:26:WU00:FS01:0xa4:Project: 7809 (Run 5, Clone 106, Gen 113)
08:05:26:WU00:FS01:0xa4:
08:05:27:WU00:FS01:0xa4:Assembly optimizations on if available.
08:05:27:WU00:FS01:0xa4:Entering M.D.
08:05:31:WU02:FS00:0x11:Will resume from checkpoint file
08:05:31:WU02:FS00:0x11:Tpr hash 02/wudata_01.tpr:  2124241214 1914994775 1056910139 1102866594 2690281205
08:05:31:WU02:FS00:0x11:
08:05:31:WU02:FS00:0x11:Calling fah_main args: 14 usage=100
08:05:31:WU02:FS00:0x11:
08:05:32:WU02:FS00:0x11:Working on Protein
08:05:32:WU00:FS01:0xa4:Using Gromacs checkpoints
08:05:32:WU00:FS01:0xa4:Mapping NT from 2 to 2 
08:05:33:WU00:FS01:0xa4:Resuming from checkpoint
08:05:33:WU00:FS01:0xa4:Verified 00/wudata_01.log
08:05:33:WU00:FS01:0xa4:Verified 00/wudata_01.trr
08:05:33:WU00:FS01:0xa4:Verified 00/wudata_01.xtc
08:05:33:WU00:FS01:0xa4:Verified 00/wudata_01.edr
08:05:34:WU00:FS01:0xa4:Completed 710290 out of 1500000 steps  (47%)
08:05:34:WU02:FS00:0x11:Client config unavailable.
08:05:35:WU02:FS00:0x11:Starting GUI Server
08:05:35:WU02:FS00:0x11:Resuming from checkpoint
08:05:35:WU02:FS00:0x11:fcCheckPointResume: retreived and current tpr file hash:
08:05:35:WU02:FS00:0x11:   0   2124241214   2124241214
08:05:35:WU02:FS00:0x11:   1   1914994775   1914994775
08:05:35:WU02:FS00:0x11:   2   1056910139   1056910139
08:05:35:WU02:FS00:0x11:   3   1102866594   1102866594
08:05:35:WU02:FS00:0x11:   4   2690281205   2690281205
08:05:35:WU02:FS00:0x11:fcCheckPointResume: file hashes same.
08:05:35:WU02:FS00:0x11:fcCheckPointResume: state restored.
08:05:35:WU02:FS00:0x11:Verified 02/wudata_01.log
08:05:35:WU02:FS00:0x11:Verified 02/wudata_01.edr
08:05:35:WU02:FS00:0x11:Verified 02/wudata_01.xtc
08:05:35:WU02:FS00:0x11:Completed 92%
08:05:35:WU02:FS00:0x11:mdrun_gpu returned 
08:05:35:WU02:FS00:0x11:Calculated & specified T inconsisitent
08:05:35:WU02:FS00:0x11:
08:05:35:WU02:FS00:0x11:Folding@home Core Shutdown: UNSTABLE_MACHINE
08:05:35:WU02:FS00:FahCore returned: UNSTABLE_MACHINE (122 = 0x7a)
08:07:02:WU02:FS00:Starting
08:07:02:WU02:FS00:Running FahCore: E:\FAHClient/FAHCoreWrapper.exe "C:/Documents and Settings/name/Application Data/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/NVIDIA/G80/Core_11.fah/FahCore_11.exe" -dir 02 -suffix 01 -version 701 -lifeline 248 -checkpoint 15 -gpu 0
08:07:02:WU02:FS00:Started FahCore on PID 5996
08:07:02:WU02:FS00:Core PID:6064
08:07:02:WU02:FS00:FahCore 0x11 started
08:07:02:WU02:FS00:0x11:
08:07:02:WU02:FS00:0x11:*------------------------------*
08:07:02:WU02:FS00:0x11:Folding@Home GPU Core
08:07:02:WU02:FS00:0x11:Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
08:07:02:WU02:FS00:0x11:
08:07:02:WU02:FS00:0x11:Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
08:07:02:WU02:FS00:0x11:Build host: amoeba
08:07:02:WU02:FS00:0x11:Board Type: Nvidia
08:07:02:WU02:FS00:0x11:Core      : 
08:07:02:WU02:FS00:0x11:Preparing to commence simulation
08:07:02:WU02:FS00:0x11:- Looking at optimizations...
08:07:02:WU02:FS00:0x11:DeleteFrameFiles: successfully deleted file=02/wudata_01.ckp
08:07:02:WU02:FS00:0x11:- Created dyn
08:07:02:WU02:FS00:0x11:- Files status OK
08:07:02:WU02:FS00:0x11:- Expanded 62974 -> 336763 (decompressed 534.7 percent)
08:07:02:WU02:FS00:0x11:Called DecompressByteArray: compressed_data_size=62974 data_size=336763, decompressed_data_size=336763 diff=0
08:07:03:WU02:FS00:0x11:- Digital signature verified
08:07:03:WU02:FS00:0x11:
08:07:03:WU02:FS00:0x11:Project: 10502 (Run 489, Clone 0, Gen 427)
08:07:03:WU02:FS00:0x11:
08:07:03:WU02:FS00:0x11:Assembly optimizations on if available.
08:07:03:WU02:FS00:0x11:Entering M.D.
08:07:08:WU02:FS00:0x11:Tpr hash 02/wudata_01.tpr:  2124241214 1914994775 1056910139 1102866594 2690281205
08:07:08:WU02:FS00:0x11:
08:07:08:WU02:FS00:0x11:Calling fah_main args: 14 usage=100
08:07:08:WU02:FS00:0x11:
08:07:09:WU02:FS00:0x11:Working on Protein
08:07:12:WU02:FS00:0x11:Client config unavailable.
08:07:12:WU02:FS00:0x11:Starting GUI Server

Re: HELP! Did I just lose 5 days of folding?

Posted: Mon Jul 09, 2012 10:19 am
by compdewd
It looks like the same work unit you had completed 92% on. The log does not show it starting over yet though, so it isn't clear if you lost the last 5 days or not. Though if FAHControl shows a 0% progress bar, then you most likely lost it :/
The unstable machine error is something you should look into. Your GPU may be getting too hot

Re: HELP! Did I just lose 5 days of folding?

Posted: Mon Jul 09, 2012 7:14 pm
by bruce
I see the error

Code: Select all

08:05:35:WU02:FS00:0x11:Calculated & specified T inconsisitent
08:05:35:WU02:FS00:0x11:
08:05:35:WU02:FS00:0x11:Folding@home Core Shutdown: UNSTABLE_MACHINE
08:05:35:WU02:FS00:FahCore returned: UNSTABLE_MACHINE (122 = 0x7a)
08:07:02:WU02:FS00:Starting
FahCore_11 is trying to restart but it has detected that the simulated time T in the checkpoint doesn't match what it should be. You can't resume work from a checkpoint which contain inconsistencies so the WU was restarted. If it took you 5 days to get to this point, then the answer is Yes, you lost 5 days.

There might be a clue to what happened at the end of the previous log (whenever the WU was paused) but there may not be. In the past I had trouble pausing/resuming work with FahCore_11 and if it's a bug, it's not likely to be updated. Current development work seems to be going into FahCores which are used by newer GPUs.

Re: HELP! Did I just lose 5 days of folding?

Posted: Tue Jul 10, 2012 12:27 am
by RMouse
Thank you everyone. Yes it did start over after this error so I did lose the 5 days of folding. Oh well, back to the drawing board.

I never had this happen before. Is pausing a bad idea? I do that a lot since sometimes I need to use my computer and folding consumes a lot of performance.

Thanks for the help.

Re: HELP! Did I just lose 5 days of folding?

Posted: Tue Jul 10, 2012 3:22 am
by bruce
I don't see your hardware profile so I'd have to guess what can be done about "...folding consumes a lot of performance." but maybe there's something that can be done about that if we knew more. In many cases, fah can be convinced to yield performance when you want to run something else without completely pausing the WUs. Do you leave the CPU/SMP slot processing and just pause the GPU or the other way around?

Re: HELP! Did I just lose 5 days of folding?

Posted: Wed Jul 11, 2012 1:11 am
by RMouse
bruce wrote:I don't see your hardware profile so I'd have to guess what can be done about "...folding consumes a lot of performance." but maybe there's something that can be done about that if we knew more. In many cases, fah can be convinced to yield performance when you want to run something else without completely pausing the WUs. Do you leave the CPU/SMP slot processing and just pause the GPU or the other way around?
I just hit the pause button at the top of the client and that kills everything. Both cores shut down and the GPU.

Is that the best way to stop folding?

Re: HELP! Did I just lose 5 days of folding?

Posted: Wed Jul 11, 2012 1:54 am
by bruce
Yes, but you can also Pause individual slots. Right click on them. I do suggest you experiment with pausing only the GPU or only the SMP slot (assuming you have both running) and then we can work on how to get your foreground performance high enough to make you happy and still allows FAH to continue to process at a slower rate when you're using the computer -- or just keep using pause if you don't want to experiment with FAH on your system.

Re: HELP! Did I just lose 5 days of folding?

Posted: Wed Jul 11, 2012 2:45 am
by Jesse_V
Btw, the right-click-on-the-slot thing only works in Advanced and Expert modes. Use the drop-down menu in the upper right-hand corner to switch modes.

Re: HELP! Did I just lose 5 days of folding?

Posted: Wed Jul 11, 2012 7:20 am
by RMouse
bruce wrote:Yes, but you can also Pause individual slots. Right click on them. I do suggest you experiment with pausing only the GPU or only the SMP slot (assuming you have both running) and then we can work on how to get your foreground performance high enough to make you happy and still allows FAH to continue to process at a slower rate when you're using the computer -- or just keep using pause if you don't want to experiment with FAH on your system.
Interesting. When I shut down the SMP core, my CPU does not drop to 0% performance. It stays at 50%. Does FAH still is being reported as using 50% of my CPU power with only the GPU running. It looks like that only one core of my core 2 duo is shut down that way. Does that seem right?

Also, I get an error box when I shut down the GPU by right clicking. It says click OK to terminate the program. FAH does not close when I click OK and I lose no folding when i restart the GPU. But it does look a bit scary to see an error message.

Re: HELP! Did I just lose 5 days of folding?

Posted: Wed Jul 11, 2012 1:55 pm
by P5-133XL
No HW profile, means I'm just guessing. I'm assuming that you have an AMD GPU and a dual-core processor. AMD GPU's typically need a full CPU core in addition to the GPU. At that point you will only have a single core for the SMP slot, so I suggest that you go uniprocessor + GPU or SMP without the GPU whichever gives you the highest PPD (points per day). If my read of your hardware is incorrect then feel free to disregard because my suggestion only really applies for that set of HW assumptions.

The SMP or Uniprocessor slots should not affect performance of your machine when it is on. The Windows priority system will execute the application in preference to folding because the application will invariably have a higher priority since folding is very low.

GPU folding however has no method of getting out of the way, so it will cause problems if you want to run something else. So if you are having performance issues, suspend the GPU slot as your first choice and see if that helps.

Re: HELP! Did I just lose 5 days of folding?

Posted: Wed Jul 11, 2012 10:15 pm
by RMouse
P5-133XL wrote:No HW profile, means I'm just guessing. I'm assuming that you have an AMD GPU and a dual-core processor. AMD GPU's typically need a full CPU core in addition to the GPU. At that point you will only have a single core for the SMP slot, so I suggest that you go uniprocessor + GPU or SMP without the GPU whichever gives you the highest PPD (points per day). If my read of your hardware is incorrect then feel free to disregard because my suggestion only really applies for that set of HW assumptions.

The SMP or Uniprocessor slots should not affect performance of your machine when it is on. The Windows priority system will execute the application in preference to folding because the application will invariably have a higher priority since folding is very low.

GPU folding however has no method of getting out of the way, so it will cause problems if you want to run something else. So if you are having performance issues, suspend the GPU slot as your first choice and see if that helps.
Thank you. I understand the issues with the GPU now.