HELP! Did I just lose 5 days of folding?

If you're new to FAH and need help getting started or you have very basic questions, start here.

Moderators: Site Moderators, FAHC Science Team

Post Reply
RMouse
Posts: 146
Joined: Wed Jun 13, 2012 6:15 am

HELP! Did I just lose 5 days of folding?

Post by RMouse »

Help! I had a project up to 92% when it disappeared on me after I unpaused my client. Then I found a new project at 0% in it's place. I am not sure what is going on here. Can someone please advise what happened and how I can prevent it from happening again? Thank you. Here is the log file:

Code: Select all

08:05:24:FS00:Unpaused
08:05:24:FS01:Unpaused
08:05:25:WU02:FS00:Starting
08:05:25:WU02:FS00:Running FahCore: E:\FAHClient/FAHCoreWrapper.exe "C:/Documents and Settings/name/Application Data/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/NVIDIA/G80/Core_11.fah/FahCore_11.exe" -dir 02 -suffix 01 -version 701 -lifeline 248 -checkpoint 15 -gpu 0
08:05:25:WU02:FS00:Started FahCore on PID 4056
08:05:25:WU02:FS00:Core PID:2272
08:05:25:WU02:FS00:FahCore 0x11 started
08:05:25:WU00:FS01:Starting
08:05:25:WU00:FS01:Running FahCore: E:\FAHClient/FAHCoreWrapper.exe "C:/Documents and Settings/name/Application Data/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/Core_a4.fah/FahCore_a4.exe" -dir 00 -suffix 01 -version 701 -lifeline 248 -checkpoint 15 -np 2
08:05:25:WU00:FS01:Started FahCore on PID 3740
08:05:25:WU02:FS00:0x11:
08:05:25:WU02:FS00:0x11:*------------------------------*
08:05:25:WU02:FS00:0x11:Folding@Home GPU Core
08:05:25:WU02:FS00:0x11:Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
08:05:25:WU02:FS00:0x11:
08:05:25:WU02:FS00:0x11:Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
08:05:25:WU02:FS00:0x11:Build host: amoeba
08:05:25:WU02:FS00:0x11:Board Type: Nvidia
08:05:25:WU02:FS00:0x11:Core      : 
08:05:25:WU02:FS00:0x11:Preparing to commence simulation
08:05:25:WU02:FS00:0x11:- Looking at optimizations...
08:05:25:WU02:FS00:0x11:- Files status OK
08:05:25:WU02:FS00:0x11:- Expanded 62974 -> 336763 (decompressed 534.7 percent)
08:05:25:WU02:FS00:0x11:Called DecompressByteArray: compressed_data_size=62974 data_size=336763, decompressed_data_size=336763 diff=0
08:05:25:WU02:FS00:0x11:- Digital signature verified
08:05:25:WU02:FS00:0x11:
08:05:25:WU02:FS00:0x11:Project: 10502 (Run 489, Clone 0, Gen 427)
08:05:25:WU02:FS00:0x11:
08:05:25:WU02:FS00:0x11:Assembly optimizations on if available.
08:05:25:WU02:FS00:0x11:Entering M.D.
08:05:26:WU00:FS01:Core PID:3648
08:05:26:WU00:FS01:FahCore 0xa4 started
08:05:26:WU00:FS01:0xa4:
08:05:26:WU00:FS01:0xa4:*------------------------------*
08:05:26:WU00:FS01:0xa4:Folding@Home Gromacs GB Core
08:05:26:WU00:FS01:0xa4:Version 2.27 (Dec. 15, 2010)
08:05:26:WU00:FS01:0xa4:
08:05:26:WU00:FS01:0xa4:Preparing to commence simulation
08:05:26:WU00:FS01:0xa4:- Looking at optimizations...
08:05:26:WU00:FS01:0xa4:- Files status OK
08:05:26:WU00:FS01:0xa4:- Expanded 2079246 -> 5386224 (decompressed 259.0 percent)
08:05:26:WU00:FS01:0xa4:Called DecompressByteArray: compressed_data_size=2079246 data_size=5386224, decompressed_data_size=5386224 diff=0
08:05:26:WU00:FS01:0xa4:- Digital signature verified
08:05:26:WU00:FS01:0xa4:
08:05:26:WU00:FS01:0xa4:Project: 7809 (Run 5, Clone 106, Gen 113)
08:05:26:WU00:FS01:0xa4:
08:05:27:WU00:FS01:0xa4:Assembly optimizations on if available.
08:05:27:WU00:FS01:0xa4:Entering M.D.
08:05:31:WU02:FS00:0x11:Will resume from checkpoint file
08:05:31:WU02:FS00:0x11:Tpr hash 02/wudata_01.tpr:  2124241214 1914994775 1056910139 1102866594 2690281205
08:05:31:WU02:FS00:0x11:
08:05:31:WU02:FS00:0x11:Calling fah_main args: 14 usage=100
08:05:31:WU02:FS00:0x11:
08:05:32:WU02:FS00:0x11:Working on Protein
08:05:32:WU00:FS01:0xa4:Using Gromacs checkpoints
08:05:32:WU00:FS01:0xa4:Mapping NT from 2 to 2 
08:05:33:WU00:FS01:0xa4:Resuming from checkpoint
08:05:33:WU00:FS01:0xa4:Verified 00/wudata_01.log
08:05:33:WU00:FS01:0xa4:Verified 00/wudata_01.trr
08:05:33:WU00:FS01:0xa4:Verified 00/wudata_01.xtc
08:05:33:WU00:FS01:0xa4:Verified 00/wudata_01.edr
08:05:34:WU00:FS01:0xa4:Completed 710290 out of 1500000 steps  (47%)
08:05:34:WU02:FS00:0x11:Client config unavailable.
08:05:35:WU02:FS00:0x11:Starting GUI Server
08:05:35:WU02:FS00:0x11:Resuming from checkpoint
08:05:35:WU02:FS00:0x11:fcCheckPointResume: retreived and current tpr file hash:
08:05:35:WU02:FS00:0x11:   0   2124241214   2124241214
08:05:35:WU02:FS00:0x11:   1   1914994775   1914994775
08:05:35:WU02:FS00:0x11:   2   1056910139   1056910139
08:05:35:WU02:FS00:0x11:   3   1102866594   1102866594
08:05:35:WU02:FS00:0x11:   4   2690281205   2690281205
08:05:35:WU02:FS00:0x11:fcCheckPointResume: file hashes same.
08:05:35:WU02:FS00:0x11:fcCheckPointResume: state restored.
08:05:35:WU02:FS00:0x11:Verified 02/wudata_01.log
08:05:35:WU02:FS00:0x11:Verified 02/wudata_01.edr
08:05:35:WU02:FS00:0x11:Verified 02/wudata_01.xtc
08:05:35:WU02:FS00:0x11:Completed 92%
08:05:35:WU02:FS00:0x11:mdrun_gpu returned 
08:05:35:WU02:FS00:0x11:Calculated & specified T inconsisitent
08:05:35:WU02:FS00:0x11:
08:05:35:WU02:FS00:0x11:Folding@home Core Shutdown: UNSTABLE_MACHINE
08:05:35:WU02:FS00:FahCore returned: UNSTABLE_MACHINE (122 = 0x7a)
08:07:02:WU02:FS00:Starting
08:07:02:WU02:FS00:Running FahCore: E:\FAHClient/FAHCoreWrapper.exe "C:/Documents and Settings/name/Application Data/FAHClient/cores/www.stanford.edu/~pande/Win32/x86/NVIDIA/G80/Core_11.fah/FahCore_11.exe" -dir 02 -suffix 01 -version 701 -lifeline 248 -checkpoint 15 -gpu 0
08:07:02:WU02:FS00:Started FahCore on PID 5996
08:07:02:WU02:FS00:Core PID:6064
08:07:02:WU02:FS00:FahCore 0x11 started
08:07:02:WU02:FS00:0x11:
08:07:02:WU02:FS00:0x11:*------------------------------*
08:07:02:WU02:FS00:0x11:Folding@Home GPU Core
08:07:02:WU02:FS00:0x11:Version 1.31 (Tue Sep 15 10:57:42 PDT 2009)
08:07:02:WU02:FS00:0x11:
08:07:02:WU02:FS00:0x11:Compiler  : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86 
08:07:02:WU02:FS00:0x11:Build host: amoeba
08:07:02:WU02:FS00:0x11:Board Type: Nvidia
08:07:02:WU02:FS00:0x11:Core      : 
08:07:02:WU02:FS00:0x11:Preparing to commence simulation
08:07:02:WU02:FS00:0x11:- Looking at optimizations...
08:07:02:WU02:FS00:0x11:DeleteFrameFiles: successfully deleted file=02/wudata_01.ckp
08:07:02:WU02:FS00:0x11:- Created dyn
08:07:02:WU02:FS00:0x11:- Files status OK
08:07:02:WU02:FS00:0x11:- Expanded 62974 -> 336763 (decompressed 534.7 percent)
08:07:02:WU02:FS00:0x11:Called DecompressByteArray: compressed_data_size=62974 data_size=336763, decompressed_data_size=336763 diff=0
08:07:03:WU02:FS00:0x11:- Digital signature verified
08:07:03:WU02:FS00:0x11:
08:07:03:WU02:FS00:0x11:Project: 10502 (Run 489, Clone 0, Gen 427)
08:07:03:WU02:FS00:0x11:
08:07:03:WU02:FS00:0x11:Assembly optimizations on if available.
08:07:03:WU02:FS00:0x11:Entering M.D.
08:07:08:WU02:FS00:0x11:Tpr hash 02/wudata_01.tpr:  2124241214 1914994775 1056910139 1102866594 2690281205
08:07:08:WU02:FS00:0x11:
08:07:08:WU02:FS00:0x11:Calling fah_main args: 14 usage=100
08:07:08:WU02:FS00:0x11:
08:07:09:WU02:FS00:0x11:Working on Protein
08:07:12:WU02:FS00:0x11:Client config unavailable.
08:07:12:WU02:FS00:0x11:Starting GUI Server
Last edited by bruce on Tue Jul 10, 2012 3:23 am, edited 1 time in total.
Reason: Added [code} tags.
compdewd
Posts: 165
Joined: Sat Jun 09, 2012 6:56 am
Hardware configuration: [1] Debian 8 64-bit: EVGA NVIDIA GTX 650 Ti, MSI NVIDIA GTX 460, AMD FX-8120
[2] Windows 7 64-bit: MSI NVIDIA GTX 460, AMD Phenom II X4
Location: Cincinnati, Ohio, USA
Contact:

Re: HELP! Did I just lose 5 days of folding?

Post by compdewd »

It looks like the same work unit you had completed 92% on. The log does not show it starting over yet though, so it isn't clear if you lost the last 5 days or not. Though if FAHControl shows a 0% progress bar, then you most likely lost it :/
The unstable machine error is something you should look into. Your GPU may be getting too hot
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: HELP! Did I just lose 5 days of folding?

Post by bruce »

I see the error

Code: Select all

08:05:35:WU02:FS00:0x11:Calculated & specified T inconsisitent
08:05:35:WU02:FS00:0x11:
08:05:35:WU02:FS00:0x11:Folding@home Core Shutdown: UNSTABLE_MACHINE
08:05:35:WU02:FS00:FahCore returned: UNSTABLE_MACHINE (122 = 0x7a)
08:07:02:WU02:FS00:Starting
FahCore_11 is trying to restart but it has detected that the simulated time T in the checkpoint doesn't match what it should be. You can't resume work from a checkpoint which contain inconsistencies so the WU was restarted. If it took you 5 days to get to this point, then the answer is Yes, you lost 5 days.

There might be a clue to what happened at the end of the previous log (whenever the WU was paused) but there may not be. In the past I had trouble pausing/resuming work with FahCore_11 and if it's a bug, it's not likely to be updated. Current development work seems to be going into FahCores which are used by newer GPUs.
RMouse
Posts: 146
Joined: Wed Jun 13, 2012 6:15 am

Re: HELP! Did I just lose 5 days of folding?

Post by RMouse »

Thank you everyone. Yes it did start over after this error so I did lose the 5 days of folding. Oh well, back to the drawing board.

I never had this happen before. Is pausing a bad idea? I do that a lot since sometimes I need to use my computer and folding consumes a lot of performance.

Thanks for the help.
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: HELP! Did I just lose 5 days of folding?

Post by bruce »

I don't see your hardware profile so I'd have to guess what can be done about "...folding consumes a lot of performance." but maybe there's something that can be done about that if we knew more. In many cases, fah can be convinced to yield performance when you want to run something else without completely pausing the WUs. Do you leave the CPU/SMP slot processing and just pause the GPU or the other way around?
RMouse
Posts: 146
Joined: Wed Jun 13, 2012 6:15 am

Re: HELP! Did I just lose 5 days of folding?

Post by RMouse »

bruce wrote:I don't see your hardware profile so I'd have to guess what can be done about "...folding consumes a lot of performance." but maybe there's something that can be done about that if we knew more. In many cases, fah can be convinced to yield performance when you want to run something else without completely pausing the WUs. Do you leave the CPU/SMP slot processing and just pause the GPU or the other way around?
I just hit the pause button at the top of the client and that kills everything. Both cores shut down and the GPU.

Is that the best way to stop folding?
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: HELP! Did I just lose 5 days of folding?

Post by bruce »

Yes, but you can also Pause individual slots. Right click on them. I do suggest you experiment with pausing only the GPU or only the SMP slot (assuming you have both running) and then we can work on how to get your foreground performance high enough to make you happy and still allows FAH to continue to process at a slower rate when you're using the computer -- or just keep using pause if you don't want to experiment with FAH on your system.
Jesse_V
Site Moderator
Posts: 2850
Joined: Mon Jul 18, 2011 4:44 am
Hardware configuration: OS: Windows 10, Kubuntu 19.04
CPU: i7-6700k
GPU: GTX 970, GTX 1080 TI
RAM: 24 GB DDR4
Location: Western Washington

Re: HELP! Did I just lose 5 days of folding?

Post by Jesse_V »

Btw, the right-click-on-the-slot thing only works in Advanced and Expert modes. Use the drop-down menu in the upper right-hand corner to switch modes.
F@h is now the top computing platform on the planet and nothing unites people like a dedicated fight against a common enemy. This virus affects all of us. Lets end it together.
RMouse
Posts: 146
Joined: Wed Jun 13, 2012 6:15 am

Re: HELP! Did I just lose 5 days of folding?

Post by RMouse »

bruce wrote:Yes, but you can also Pause individual slots. Right click on them. I do suggest you experiment with pausing only the GPU or only the SMP slot (assuming you have both running) and then we can work on how to get your foreground performance high enough to make you happy and still allows FAH to continue to process at a slower rate when you're using the computer -- or just keep using pause if you don't want to experiment with FAH on your system.
Interesting. When I shut down the SMP core, my CPU does not drop to 0% performance. It stays at 50%. Does FAH still is being reported as using 50% of my CPU power with only the GPU running. It looks like that only one core of my core 2 duo is shut down that way. Does that seem right?

Also, I get an error box when I shut down the GPU by right clicking. It says click OK to terminate the program. FAH does not close when I click OK and I lose no folding when i restart the GPU. But it does look a bit scary to see an error message.
P5-133XL
Posts: 2948
Joined: Sun Dec 02, 2007 4:36 am
Hardware configuration: Machine #1:

Intel Q9450; 2x2GB=8GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460; Windows Server 2008 X64 (SP1).

Machine #2:

Intel Q6600; 2x2GB=4GB Ram; Gigabyte GA-X48-DS4 Motherboard; PC Power and Cooling Q750 PS; 2x GTX 460 video card; Windows 7 X64.

Machine 3:

Dell Dimension 8400, 3.2GHz P4 4x512GB Ram, Video card GTX 460, Windows 7 X32

I am currently folding just on the 5x GTX 460's for aprox. 70K PPD
Location: Salem. OR USA

Re: HELP! Did I just lose 5 days of folding?

Post by P5-133XL »

No HW profile, means I'm just guessing. I'm assuming that you have an AMD GPU and a dual-core processor. AMD GPU's typically need a full CPU core in addition to the GPU. At that point you will only have a single core for the SMP slot, so I suggest that you go uniprocessor + GPU or SMP without the GPU whichever gives you the highest PPD (points per day). If my read of your hardware is incorrect then feel free to disregard because my suggestion only really applies for that set of HW assumptions.

The SMP or Uniprocessor slots should not affect performance of your machine when it is on. The Windows priority system will execute the application in preference to folding because the application will invariably have a higher priority since folding is very low.

GPU folding however has no method of getting out of the way, so it will cause problems if you want to run something else. So if you are having performance issues, suspend the GPU slot as your first choice and see if that helps.
Image
RMouse
Posts: 146
Joined: Wed Jun 13, 2012 6:15 am

Re: HELP! Did I just lose 5 days of folding?

Post by RMouse »

P5-133XL wrote:No HW profile, means I'm just guessing. I'm assuming that you have an AMD GPU and a dual-core processor. AMD GPU's typically need a full CPU core in addition to the GPU. At that point you will only have a single core for the SMP slot, so I suggest that you go uniprocessor + GPU or SMP without the GPU whichever gives you the highest PPD (points per day). If my read of your hardware is incorrect then feel free to disregard because my suggestion only really applies for that set of HW assumptions.

The SMP or Uniprocessor slots should not affect performance of your machine when it is on. The Windows priority system will execute the application in preference to folding because the application will invariably have a higher priority since folding is very low.

GPU folding however has no method of getting out of the way, so it will cause problems if you want to run something else. So if you are having performance issues, suspend the GPU slot as your first choice and see if that helps.
Thank you. I understand the issues with the GPU now.
Post Reply