7610 (151, 0, 0) and 7610 (147, 0, 0) Both EUEs

Moderators: Site Moderators, FAHC Science Team

Post Reply
shunter
Posts: 84
Joined: Sun Apr 06, 2008 8:22 am
Location: Hertfordshire, United Kingdom

7610 (151, 0, 0) and 7610 (147, 0, 0) Both EUEs

Post by shunter »

Two pcs downloaded new cores this morning, started on 7610s and both cxrashed with EUEs - see log files below. Both pcs have run without fault for months so don't think it's the pcs and suspect bad units. Both pcs were closed down and rstarted and are running 7149 ( 14% completed) and 6950 (43% completed). I did expect that the 7610s would be picked up again but both pcs went for new units. Can these 2 units be removed and examined for faults please.
Thanks
Shunter

Logfile 7610 (151, 0, 0)

Code: Select all

[03:16:38] Verifying core Core_a4.fah...
[03:16:38] Signature is VALID
[03:16:38] 
[03:16:38] Trying to unzip core FahCore_a4.exe
[03:16:39] Decompressed FahCore_a4.exe (10057216 bytes) successfully
[03:16:44] + Core successfully engaged
[03:16:49] 
[03:16:49] + Processing work unit
[03:16:49] Work type a4 not eligible for variable processors
[03:16:49] Core required: FahCore_a4.exe
[03:16:49] Core found.
[03:16:49] Working on queue slot 05 [June 15 03:16:49 UTC]
[03:16:49] + Working ...
[03:16:49] - Calling 'mpiexec -np 4 -channel auto -host 127.0.0.1 FahCore_a4.exe -dir work/ -suffix 05 -checkpoint 10 -forceasm -verbose -lifeline 3272 -version 629'

[03:16:49] 
[03:16:49] *------------------------------*
[03:16:49] Folding@Home Gromacs GB Core
[03:16:49] Version 2.27 (Dec. 15, 2010)
[03:16:49] 
[03:16:49] Preparing to commence simulation
[03:16:49] - Ensuring status. Please wait.
[03:16:49] Called DecompressByteArray: compressed_data_size=270125 data_size=644556, decompressed_data_size=644556 diff=0
[03:16:49] - Digital signature verified
[03:16:49] 
[03:16:49] Project: 7610 (Run 151, Clone 0, Gen 0)
[03:16:49] 
[03:16:49] Assembly optimizations on if available.
[03:16:49] Entering M.D.
[03:16:55] Mapping NT from 1 to 1 
[03:16:55] Completed 0 out of 2000000 steps  (0%)
[03:17:05] ed 0 out of 2000000 steps  (0%)
[03:40:57] teps  (1%)
[03:41:15] 0000 out of 2000000 steps  (1%)
[04:04:56] teps  (2%)
[04:05:16] 0000 out of 2000000 steps  (2%)
[04:28:48] teps  (3%)
[04:29:15] 0000 out of 2000000 steps  (3%)
[04:52:43] teps  (4%)
[04:53:17] 0000 out of 2000000 steps  (4%)
[04:56:00] - Autosending finished units... [June 15 04:56:00 UTC]
[04:56:00] Trying to send all finished work units
[04:56:00] + No unsent completed units remaining.
[04:56:00] - Autosend completed
[05:16:57] steps  (5%)
[05:17:34] 0000 out of 2000000 steps  (5%)
[05:40:44] steps  (6%)
[05:41:27] 0000 out of 2000000 steps  (6%)
[06:04:44] steps  (7%)
[06:05:32] 0000 out of 2000000 steps  (7%)
[06:07:06] ave done -- stepsTotalG=2000000
[06:07:06] Work fraction=0.0697 steps=2000000.
[06:07:10] logfile size=12019 infoLength=12019 edr=0 trr=25
[06:07:10] logfile size: 12019 info=12019 bed=0 hdr=25
[06:07:10] - Writing 12557 bytes of core data to disk...
[06:07:10] Done: 12045 -> 4013 (compressed to 33.3 percent)
[06:07:10]   ... Done.
[06:07:10] 
[06:07:10] Folding@home Core Shutdown: UNSTABLE_MACHINE
[06:37:10]  (compressed to 32.0 percent)
[06:37:10]   ... Done.
[06:37:10] 
[06:37:10] Folding@home Core Shutdown: EARLY_UNIT_END
[07:50:40] Killing all core threads
[07:50:40] Could not get process id information.  Please kill core process manually

Folding@Home Client Shutdown at user request.
[07:50:40] ***** Got a SIGTERM signal (2)
[07:50:40] Killing all core threads
[07:50:40] Could not get process id information.  Please kill core process manually

Folding@Home Client Shutdown.
Logfile 7610 (147, 0, 0)

Code: Select all

[03:02:02] Verifying core Core_a4.fah...
[03:02:02] Signature is VALID
[03:02:02] 
[03:02:02] Trying to unzip core FahCore_a4.exe
[03:02:04] Decompressed FahCore_a4.exe (10057216 bytes) successfully
[03:02:09] + Core successfully engaged
[03:02:14] 
[03:02:14] + Processing work unit
[03:02:14] Work type a4 not eligible for variable processors
[03:02:14] Core required: FahCore_a4.exe
[03:02:14] Core found.
[03:02:14] Working on queue slot 09 [June 15 03:02:14 UTC]
[03:02:14] + Working ...
[03:02:14] - Calling 'mpiexec -np 4 -channel auto -host 127.0.0.1 FahCore_a4.exe -dir work/ -suffix 09 -checkpoint 15 -forceasm -verbose -lifeline 3104 -version 629'

[03:02:15] 
[03:02:15] *------------------------------*
[03:02:15] Folding@Home Gromacs GB Core
[03:02:15] Version 2.27 (Dec. 15, 2010)
[03:02:15] 
[03:02:15] Preparing to commence simulation
[03:02:15] - Assembly optimizations manually forced on.
[03:02:15] - Not checking prior termination.
[03:02:15] - Expanded 270054 -> 644556 (decompressed 238.6 percent)
[03:02:15] Called DecompressByteArray: compressed_data_size=270054 data_size=644556, decompressed_data_size=644556 diff=0
[03:02:15] - Digital signature verified
[03:02:15] 
[03:02:15] Project: 7610 (Run 147, Clone 0, Gen 0)
[03:02:15] 
[03:02:15] Assembly optimizations on if available.
[03:02:15] Entering M.D.
[03:02:21] Mapping NT from 1 to 1 
[03:02:22] Completed 0 out of 2000000 steps  (0%)
[03:44:17] Completed 20000 out of 2000000 steps  (1%)
[04:26:08] Completed 40000 out of 2000000 steps  (2%)
[04:32:24] mdrun returned 255
[04:32:24] Going to send back what have done -- stepsTotalG=2000000
[04:32:24] Work fraction=0.0214 steps=2000000.
[04:32:28] logfile size=10736 infoLength=10736 edr=0 trr=25
[04:32:28] logfile size: 10736 info=10736 bed=0 hdr=25
[04:32:28] - Writing 11274 bytes of core data to disk...
[04:32:28] Done: 10762 -> 3789 (compressed to 35.2 percent)
[04:32:28]   ... Done.
[04:32:28] 
[04:32:28] Folding@home Core Shutdown: EARLY_UNIT_END
[08:10:33] Killing all core threads
[08:10:33] Killing 2 cores
[08:10:33] Killing core 0
[08:10:33] Killing core 1

Folding@Home Client Shutdown at user request.
[08:10:33] ***** Got a SIGTERM signal (2)
[08:10:33] Killing all core threads
[08:10:33] Killing 2 cores
[08:10:33] Killing core 0
[08:10:33] Killing core 1

Folding@Home Client Shutdown.
Image
Kougar
Posts: 61
Joined: Fri Apr 11, 2008 2:39 am
Hardware configuration: Core i7 920 @ 4.3GHz 1.42v (HT on)
Gigabyte GA-X58-UD5 (F10)
3 x 2GB OCZ Platinum 16400MHz 8-8-8-24 1T
EVGA GTX 260 w/ D-Tek Fuzion 2 GFX
ASUS Xonar DX | Cooler Master UCP 1kW
Intel X25-M 80GB SSD | Windows 7 x64
Swiftech Apogee GTZ + MCP655 Pump & Thermochill PA120.3 Radiator
Location: Texas

Re: 7610 (151, 0, 0) and 7610 (147, 0, 0) Both EUEs

Post by Kougar »

Yeah, looks like the right flags are not set server/project side as 7610 / core a4 is designed for single-core use. Also found a thread where it was assigned to OS X users, but the a4 core is not eligible to run on them either.

My client has folded for over six months under Win 7, and is configured to spawn 12 threads for Bigadv folding. I'm using a six-core processor configured to spawn 12 threads, so that might possibly be why a project using the a4 core designed for single-threaded folding would error. Nothing in my config has been changed in weeks to cause it to download this WU or a4 core.

Project: 7610 (Run 236, Clone 0, Gen 0)

Code: Select all

[08:46:14] + 3020800 bytes downloaded
[08:46:15] + 3028899 bytes downloaded
[08:46:15] Verifying core Core_a4.fah...
[08:46:15] Signature is VALID
[08:46:15] 
[08:46:15] Trying to unzip core FahCore_a4.exe
[08:46:15] Decompressed FahCore_a4.exe (10057216 bytes) successfully
[08:46:20] + Core successfully engaged
[08:46:25] 
[08:46:25] + Processing work unit
[08:46:25] Work type a4 not eligible for variable processors
[08:46:25] Core required: FahCore_a4.exe
[08:46:25] Core found.
[08:46:25] Working on queue slot 03 [June 15 08:46:25 UTC]
[08:46:25] + Working ...
[08:46:25] 
[08:46:25] *------------------------------*
[08:46:25] Folding@Home Gromacs GB Core
[08:46:25] Version 2.27 (Dec. 15, 2010)
[08:46:25] 
[08:46:25] Preparing to commence simulation
[08:46:25] - Ensuring status. Please wait.
[08:46:25] Called DecompressByteArray: compressed_data_size=270361 data_size=644556, decompressed_data_size=644556 diff=0
[08:46:25] - Digital signature verified
[08:46:25] 
[b][08:46:25] Project: 7610 (Run 236, Clone 0, Gen 0)[/b]
[08:46:25] 
[08:46:25] Assembly optimizations on if available.
[08:46:25] Entering M.D.
[08:46:31] Mapping NT from 1 to 1 
[08:46:32] Completed 0 out of 2000000 steps  (0%)
[08:46:35] ing M.D.
[08:46:35] Clone 0, Gen 0)
[08:46:35] 
[08:46:35] Entering M.D.
[08:46:41] ed 0 out of 2000000 stepCompleted 0 out of 2000000 steps  (0%)
[09:01:42] mdrun returned 255
[09:01:42] Going to send back what have done -- stepsTotalG=2000000
[09:01:42] Work fraction=0.0065 steps=2000000.
[09:01:46] logfile size=10431 infoLength=10431 edr=0 trr=25
[09:01:46] logfile size: 10431 info=10431 bed=0 hdr=25
[09:01:46] - Writing 10969 bytes of core data to disk...
[09:01:46] Done: 10457 -> 3718 (compressed to 35.5 percent)
[09:01:46]   ... Done.
[09:01:46] 
[09:01:46] Folding@home Core Shutdown: UNSTABLE_MACHINE
kiore
Posts: 925
Joined: Fri Jan 16, 2009 5:45 pm
Location: USA

Re: 7610 (151, 0, 0) and 7610 (147, 0, 0) Both EUEs

Post by kiore »

Both my smp machines are running 7610 uniprocessors..after running SMPs for over a year.

Code: Select all

Launch directory: C:\SMP
Executable: C:\SMP\SMP.exe
Arguments: -smp -verbosity 9 

[14:02:34] - Ask before connecting: No
[14:02:34] - User name: kiore (Team 182116)
[14:02:34] - 
[14:02:34] - Machine ID: 1
[14:02:34] 
[14:02:34] Loaded queue successfully.
[14:02:34] 
[14:02:34] - Autosending finished units... [June 15 14:02:34 UTC]
[14:02:34] + Processing work unit
[14:02:34] Trying to send all finished work units
[14:02:34] Work type a4 not eligible for variable processors
[14:02:34] + No unsent completed units remaining.
[14:02:34] Core required: FahCore_a4.exe
[14:02:34] - Autosend completed
[14:02:34] Core found.
[14:02:34] Working on queue slot 09 [June 15 14:02:34 UTC]
[14:02:34] + Working ...
[14:02:34] - Calling '.\FahCore_a4.exe -dir work/ -suffix 09 -nice 19 -checkpoint 15 -verbose -lifeline 4908 -version 630'

[14:02:34] 
[14:02:34] *------------------------------*
[14:02:34] Folding@Home Gromacs GB Core
[14:02:34] Version 2.27 (Dec. 15, 2010)
[14:02:34] 
[14:02:34] Preparing to commence simulation
[14:02:34] - Ensuring status. Please wait.
[14:02:43] - Looking at optimizations...
[14:02:43] - Working with standard loops on this execution.
[14:02:43] - Previous termination of core was improper.
[14:02:43] - Files status OK
[14:02:43] - Expanded 270285 -> 644556 (decompressed 238.4 percent)
[14:02:43] Called DecompressByteArray: compressed_data_size=270285 data_size=644556, decompressed_data_size=644556 diff=0
[14:02:43] - Digital signature verified
[14:02:43] 
[14:02:43] Project: 7610 (Run 187, Clone 0, Gen 0)
[14:02:43] 
[14:02:43] Entering M.D.
[14:02:49] Using Gromacs checkpoints
[14:02:49] Mapping NT from 1 to 1 
[14:02:50] Resuming from checkpoint
[14:02:50] Verified work/wudata_09.log
[14:02:50] Verified work/wudata_09.trr
[14:02:50] Verified work/wudata_09.xtc
[14:02:50] Verified work/wudata_09.edr
[14:02:50] Completed 290200 out of 2000000 steps  (14%)
[14:19:36] Completed 300000 out of 2000000 steps  (15%)
Image
i7 7800x RTX 3070 OS= win10. AMD 3700x RTX 2080ti OS= win10 .

Team page: https://www.rationalskepticism.org/viewtopic.php?t=616
shunter
Posts: 84
Joined: Sun Apr 06, 2008 8:22 am
Location: Hertfordshire, United Kingdom

Re: 7610 (151, 0, 0) and 7610 (147, 0, 0) Both EUEs

Post by shunter »

Well I don't profess to be a techie on either Folding or hardware; however a 3rd pc has now fallen foul of whatever is going on.

Code: Select all

[13:12:04] Initial: EBF8; + 3028899 bytes downloaded
[13:12:04] Verifying core Core_a4.fah...
[13:12:05] Signature is VALID
[13:12:05] 
[13:12:05] Trying to unzip core FahCore_a4.exe
[13:12:06] Decompressed FahCore_a4.exe (10057216 bytes) successfully
[13:12:11] + Core successfully engaged
[13:12:16] 
[13:12:16] + Processing work unit
[13:12:16] Work type a4 not eligible for variable processors
[13:12:16] Core required: FahCore_a4.exe
[13:12:16] Core found.
[13:12:16] Working on queue slot 07 [June 15 13:12:16 UTC]
[13:12:16] + Working ...
[13:12:16] - Calling 'mpiexec -np 4 -channel auto -host 127.0.0.1 FahCore_a4.exe -dir work/ -suffix 07 -checkpoint 15 -forceasm -verbose -lifeline 4068 -version 629'

[13:12:18] 
[13:12:18] *------------------------------*
[13:12:18] Folding@Home Gromacs GB Core
[13:12:18] Version 2.27 (Dec. 15, 2010)
[13:12:19] 
[13:12:19] Preparing to commence simulation
[13:12:19] - Assembly optimizations manually forced on.
[13:12:19] - Not checking prior termination.
[13:12:19] - Expanded 270650 -> 644556 (decompressed 238.1 percent)
[13:12:19] Called DecompressByteArray: compressed_data_size=270650 data_size=644556, decompressed_data_size=644556 diff=0
[13:12:19] - Digital signature verified
[13:12:19] 
[13:12:19] Project: 7611 (Run 4, Clone 57, Gen 0)
[13:12:19] 
[13:12:19] Assembly optimizations on if available.
[13:12:19] Entering M.D.
[13:12:25] Mapping NT from 1 to 1 
[13:12:26] Completed 0 out of 2000000 steps  (0%)
[13:27:28] mdrun returned 255
[13:27:28] Going to send back what have done -- stepsTotalG=2000000
[13:27:28] Work fraction=0.0019 steps=2000000.
[13:27:32] logfile size=10432 infoLength=10432 edr=0 trr=25
[13:27:32] logfile size: 10432 info=10432 bed=0 hdr=25
[13:27:33] - Writing 10970 bytes of core data to disk...
[13:27:33] Done: 10458 -> 3720 (compressed to 35.5 percent)
[13:27:33]   ... Done.
[13:27:39] 
[13:27:39] Folding@home Core Shutdown: UNSTABLE_MACHINE
[14:18:52] - Autosending finished units... [June 15 14:18:52 UTC]
[14:18:52] Trying to send all finished work units
[14:18:52] + No unsent completed units remaining.
[14:18:52] - Autosend completed

Image
Kougar
Posts: 61
Joined: Fri Apr 11, 2008 2:39 am
Hardware configuration: Core i7 920 @ 4.3GHz 1.42v (HT on)
Gigabyte GA-X58-UD5 (F10)
3 x 2GB OCZ Platinum 16400MHz 8-8-8-24 1T
EVGA GTX 260 w/ D-Tek Fuzion 2 GFX
ASUS Xonar DX | Cooler Master UCP 1kW
Intel X25-M 80GB SSD | Windows 7 x64
Swiftech Apogee GTZ + MCP655 Pump & Thermochill PA120.3 Radiator
Location: Texas

Re: 7610 (151, 0, 0) and 7610 (147, 0, 0) Both EUEs

Post by Kougar »

Shunter, are ya using -smp # or just -smp for your clients? Your first log stated 2 threads were spawned, not sure about the other two clients you mentioned though.
kiore wrote:Both my smp machines are running 7610 uniprocessors..after running SMPs for over a year.

Code: Select all

Arguments: -smp -verbosity 9
I see you use "-smp". At least for mine I specify "-smp 12", so I am guessing that it forces 12 threads even though it's a single-threaded core. Quite a few bigadv users do manually specify thread counts as it used to be a required flag for initial bigadv clients, or they want to leave 1-2 threads spare.

Either way, unless these projects have some sort of extreme priority, it makes no sense to send 4-12 core, 8-24 thread systems a slow single-threaded project.
kiore
Posts: 925
Joined: Fri Jan 16, 2009 5:45 pm
Location: USA

Re: 7610 (151, 0, 0) and 7610 (147, 0, 0) Both EUEs

Post by kiore »

Well perhaps is a server work allocation error, I use just -smp but have a quad and an quad HT (recognized as 8) running a uniprocessor each, have never been allocated a uniprocessor on these setups before.
Will let them run and see what loads next.
Image
i7 7800x RTX 3070 OS= win10. AMD 3700x RTX 2080ti OS= win10 .

Team page: https://www.rationalskepticism.org/viewtopic.php?t=616
tjlane
Pande Group Member
Posts: 161
Joined: Wed Jun 01, 2011 11:19 pm
Location: Stanford, CA

Re: 7610 (151, 0, 0) and 7610 (147, 0, 0) Both EUEs

Post by tjlane »

Hi all -

We've been experiencing some difficulties with the P76** series. These units use the A4 core which should be able to run on single or multi-threaded machines, but there is some issue with the core. They've been rolled back to closed beta until we can sort this out.

Thanks for posting!
kiore
Posts: 925
Joined: Fri Jan 16, 2009 5:45 pm
Location: USA

Re: 7610 (151, 0, 0) and 7610 (147, 0, 0) Both EUEs

Post by kiore »

OK thanks for letting us know. These things happen.
So should they when picked up by a multicore machine run on all cores? Is that the plan?
Image
i7 7800x RTX 3070 OS= win10. AMD 3700x RTX 2080ti OS= win10 .

Team page: https://www.rationalskepticism.org/viewtopic.php?t=616
tjlane
Pande Group Member
Posts: 161
Joined: Wed Jun 01, 2011 11:19 pm
Location: Stanford, CA

Re: 7610 (151, 0, 0) and 7610 (147, 0, 0) Both EUEs

Post by tjlane »

Correct. The A4 core *should* be able to run on an SMP client using all cores, or simply go into single-core mode on a smaller machine. For some reason setting the -smp flag without specifying the number of threads appears to be the issue, but I'm not 100% sure at this point! You're logs have been very helpful :).
shunter
Posts: 84
Joined: Sun Apr 06, 2008 8:22 am
Location: Hertfordshire, United Kingdom

Re: 7610 (151, 0, 0) and 7610 (147, 0, 0) Both EUEs

Post by shunter »

Thanks for the responses - had 5 now so hopefully problem will go away.

Kougar
Kougar wrote:Shunter, are ya using -smp # or just -smp for your clients? Your first log stated 2 threads were spawned, not sure about the other two clients you mentioned though.
All mine are set up with -smp only - 2 are dual core, 1 is a Q6600 and the other an i7 930
Image
bruce
Posts: 20822
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: 7610 (151, 0, 0) and 7610 (147, 0, 0) Both EUEs

Post by bruce »

@shunter:
Which client version are you running?
Kougar
Posts: 61
Joined: Fri Apr 11, 2008 2:39 am
Hardware configuration: Core i7 920 @ 4.3GHz 1.42v (HT on)
Gigabyte GA-X58-UD5 (F10)
3 x 2GB OCZ Platinum 16400MHz 8-8-8-24 1T
EVGA GTX 260 w/ D-Tek Fuzion 2 GFX
ASUS Xonar DX | Cooler Master UCP 1kW
Intel X25-M 80GB SSD | Windows 7 x64
Swiftech Apogee GTZ + MCP655 Pump & Thermochill PA120.3 Radiator
Location: Texas

Re: 7610 (151, 0, 0) and 7610 (147, 0, 0) Both EUEs

Post by Kougar »

Ah, it's not a problem. Thanks for the clarification guys!

Was going by the wiki source that core a4 was single-threaded only, which the non-variable processor comment in the logs seemed to support.
tjlane wrote:For some reason setting the -smp flag without specifying the number of threads appears to be the issue, but I'm not 100% sure at this point! You're logs have been very helpful :).
Just to note I did specify -smp 12 in my original log report. Let me know if ya need any further info!
Post Reply