9704 (Run 10, Clone 13, Gen 119) (102 = 0x66)

Moderators: Site Moderators, FAHC Science Team

Post Reply
parkut
Posts: 366
Joined: Tue Feb 12, 2008 7:33 am
Hardware configuration: Running exclusively Linux headless blades. All are dedicated crunching machines.
Location: SE Michigan, USA

9704 (Run 10, Clone 13, Gen 119) (102 = 0x66)

Post by parkut »

Model Name: NVIDIA:5 GM206 [GeForce GTX 960] - CntOS Linux system
Driver Version: 352.41 - Gpu temp: 68C - Client Version: 7.3.6

Project: 9704 (Run 10, Clone 13, Gen 119) INTERRUPTED (102 = 0x66) at 25%
Restarted, but found a problem with the checkpoint file
ERROR:Guru Meditation #76e83436e7d7dcd.bb54b21bd5dbdf80 (41594500.41598750) '01/01/checkpointState.xml'
and :FahCore returned: BAD_WORK_UNIT (114 = 0x72)

Code: Select all

13:31:49:WU01:FS01:0x21:Project: 9704 (Run 10, Clone 13, Gen 119)
13:31:49:WU01:FS01:0x21:Unit: 0x000000a7ab404162553ebd50969d9738
13:31:49:WU01:FS01:0x21:CPU: 0x00000000000000000000000000000000
13:31:49:WU01:FS01:0x21:Machine: 1
13:31:49:WU01:FS01:0x21:Reading tar file core.xml
13:31:49:WU01:FS01:0x21:Reading tar file system.xml
13:31:50:WU01:FS01:0x21:Reading tar file integrator.xml
13:31:50:WU01:FS01:0x21:Reading tar file state.xml
13:31:52:WU01:FS01:0x21:Digital signatures verified
13:31:52:WU01:FS01:0x21:Folding@home GPU Core21 Folding@home Core
13:31:52:WU01:FS01:0x21:Version 0.0.11
13:33:02:WU01:FS01:0x21:Completed 0 out of 640000 steps (0%)
13:33:02:WU01:FS01:0x21:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900
13:36:48:WU01:FS01:0x21:Completed 6400 out of 640000 steps (1%)
13:40:23:WU01:FS01:0x21:Completed 12800 out of 640000 steps (2%)
13:43:58:WU01:FS01:0x21:Completed 19200 out of 640000 steps (3%)
13:47:33:WU01:FS01:0x21:Completed 25600 out of 640000 steps (4%)
13:51:08:WU01:FS01:0x21:Completed 32000 out of 640000 steps (5%)
13:54:43:WU01:FS01:0x21:Completed 38400 out of 640000 steps (6%)
13:58:18:WU01:FS01:0x21:Completed 44800 out of 640000 steps (7%)
14:01:53:WU01:FS01:0x21:Completed 51200 out of 640000 steps (8%)
14:05:28:WU01:FS01:0x21:Completed 57600 out of 640000 steps (9%)
14:09:02:WU01:FS01:0x21:Completed 64000 out of 640000 steps (10%)
14:12:37:WU01:FS01:0x21:Completed 70400 out of 640000 steps (11%)
14:16:13:WU01:FS01:0x21:Completed 76800 out of 640000 steps (12%)
14:20:02:WU01:FS01:0x21:Completed 83200 out of 640000 steps (13%)
14:23:38:WU01:FS01:0x21:Completed 89600 out of 640000 steps (14%)
14:27:12:WU01:FS01:0x21:Completed 96000 out of 640000 steps (15%)
14:30:47:WU01:FS01:0x21:Completed 102400 out of 640000 steps (16%)
14:34:22:WU01:FS01:0x21:Completed 108800 out of 640000 steps (17%)
14:37:57:WU01:FS01:0x21:Completed 115200 out of 640000 steps (18%)
14:41:32:WU01:FS01:0x21:Completed 121600 out of 640000 steps (19%)
14:45:07:WU01:FS01:0x21:Completed 128000 out of 640000 steps (20%)
14:48:42:WU01:FS01:0x21:Completed 134400 out of 640000 steps (21%)
14:52:17:WU01:FS01:0x21:Completed 140800 out of 640000 steps (22%)
14:55:52:WU01:FS01:0x21:Completed 147200 out of 640000 steps (23%)
14:59:27:WU01:FS01:0x21:Completed 153600 out of 640000 steps (24%)
15:03:02:WU01:FS01:0x21:Completed 160000 out of 640000 steps (25%)
15:03:15:WU01:FS01:FahCore returned: INTERRUPTED (102 = 0x66)
15:03:15:WU01:FS01:Starting
15:03:15:WU01:FS01:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/web.stanford.edu/~pande/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 01 -suffix 01 -version 703 -lifeline 1482 -checkpoint 15 -gpu 0 -gpu-vendor nvidia
15:03:15:WU01:FS01:Started FahCore on PID 7101
15:03:15:WU01:FS01:Core PID:7105
15:03:15:WU01:FS01:FahCore 0x21 started
15:03:15:WU01:FS01:0x21:*********************** Log Started 2015-10-29T15:03:15Z ***********************
15:03:15:WU01:FS01:0x21:Project: 9704 (Run 10, Clone 13, Gen 119)
15:03:15:WU01:FS01:0x21:Unit: 0x000000a7ab404162553ebd50969d9738
15:03:15:WU01:FS01:0x21:CPU: 0x00000000000000000000000000000000
15:03:15:WU01:FS01:0x21:Machine: 1
15:03:15:WU01:FS01:0x21:Digital signatures verified
15:03:15:WU01:FS01:0x21:Folding@home GPU Core21 Folding@home Core
15:03:15:WU01:FS01:0x21:Version 0.0.11
15:03:15:WU01:FS01:0x21:  Found a checkpoint file
15:03:27:WU01:FS01:0x21:ERROR:Guru Meditation #76e83436e7d7dcd.bb54b21bd5dbdf80 (41594500.41598750) '01/01/checkpointState.xml'
15:03:27:WU01:FS01:0x21:WARNING:Unexpected exit() call
15:03:27:WU01:FS01:0x21:WARNING:Unexpected exit from science code
15:03:27:WU01:FS01:0x21:Saving result file logfile_01.txt
15:03:27:WU01:FS01:0x21:Saving result file checkpt.crc
15:03:27:WU01:FS01:0x21:Saving result file log.txt
15:03:27:WU01:FS01:0x21:Folding@home Core Shutdown: BAD_WORK_UNIT
15:03:27:WARNING:WU01:FS01:FahCore returned: BAD_WORK_UNIT (114 = 0x72)[0m[93m
15:03:27:WU01:FS01:Sending unit results: id:01 state:SEND error:FAULTY project:9704 run:10 clone:13 gen:119 core:0x21 unit:0x000000a7ab404162553ebd50969d9738
15:03:27:WU01:FS01:Uploading 3.75KiB to 171.64.65.98
15:03:27:WU01:FS01:Connecting to 171.64.65.98:8080
15:03:27:WU01:FS01:Upload complete
15:03:27:WU01:FS01:Server responded WORK_ACK (400)
15:03:27:WU01:FS01:Cleaning up
toTOW
Site Moderator
Posts: 6497
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: 9704 (Run 10, Clone 13, Gen 119) (102 = 0x66)

Post by toTOW »

Someone has been able to complete this WU ...
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Joe_H
Site Admin
Posts: 8226
Joined: Tue Apr 21, 2009 4:41 pm
Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4
Location: W. MA

Re: 9704 (Run 10, Clone 13, Gen 119) (102 = 0x66)

Post by Joe_H »

This is two WU's you have reported. Both logs show the processing of the WU being interrupted and then immediately being restarted. The immediate restart is not giving enough time for files to be closed, and is probably why the checkpoint files were not usable. If you can figure out what is interrupting WU's like this, that may be useful in avoiding this problem. It might also be a problem with the software.
Image
toTOW
Site Moderator
Posts: 6497
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: 9704 (Run 10, Clone 13, Gen 119) (102 = 0x66)

Post by toTOW »

It might also be a good idea to update the client to 7.4.4 for better error handling ...
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Post Reply