Page 1 of 2

EUEs, multiple, on Project 7500 and 7501 work units

Posted: Tue Jul 05, 2011 7:34 pm
by Leonardo
I did not think it efficient to start a separate thread on each failed WU, sore here is a compilation a disaster earlier today with a series of projects 7500 and 7501s.

Instant EUEs, 24 times, in a row, same machine, for the following:

Project: 7501 (Run 0, Clone 114, Gen 31)
Project: 7500 (Run 0, Clone 25, Gen 132)
Project: 7500 (Run 0, Clone 75, Gen 120)
Project: 7501 (Run 0, Clone 135, Gen 30)
Project 7500 (Run 0, Clone 86, Gen 117)
Project: 7501 (Run 0, Clone 131, Gen 30)
Project: 7500 (Run 0, Clone 100, Gen 112)
Project: 7501 (Run 0, Clone 156, Gen 29)
Project: 7500 (Run 0, Clone 36, Gen 98)
Project: 7501 (Run 0, Clone 157, Gen 29)
Project: 7500 (Run 0, Clone 94, Gen 97)
Project: 7500 (Run 0, Clone 33, Gen 105)
Project: 7500 (Run 0, Clone 37, Gen 96)
Project: 7501 (Run 0, Clone 176, Gen 29)
Project: 7500 (Run 0, Clone 68, Gen 89)
Project: 7501 (Run 0, Clone 261, Gen 29)
Project: 7500 (Run 0, Clone 157, Gen 76)
....and more.... log is below

Computer: Win7/64, Client 6.34
The machine has been running SMP2 and SMP -bigadv for about nine months reliably at the same settings. The client was completing all units perfectly before this group of instant EUEs, and resumed perfect performance when a 6XXX was downloaded. I have no explanation. Could it be a hardware issue? Perhaps, but it doesn't seem logical, as the computer crunched work units perfectly before and after these failures with not hardware or client settings changes. Computer was running relatively cool throughout.

Does this count against my ratio of successfully completed SMP units for -bigadv bonuses? If so, I might be in a world of hurt, here. :lol: :shock: :|

Code: Select all

[07:59:02] Completed 500000 out of 500000 steps  (100%)
[07:59:04] DynamicWrapper: Finished Work Unit: sleep=10000
[07:59:14] 
[07:59:14] Finished Work Unit:
[07:59:14] - Reading up to 3700368 from "work/wudata_06.trr": Read 3700368
[07:59:14] trr file hash check passed.
[07:59:14] edr file hash check passed.
[07:59:14] logfile size: 55486
[07:59:14] Leaving Run
[07:59:14] - Writing 3791814 bytes of core data to disk...
[07:59:14]   ... Done.
[07:59:15] - Shutting down core
[07:59:15] 
[07:59:15] Folding@home Core Shutdown: FINISHED_UNIT
[07:59:18] CoreStatus = 64 (100)
[07:59:18] Unit 6 finished with 96 percent of time to deadline remaining.
[07:59:18] Updated performance fraction: 0.857175
[07:59:18] Sending work to server
[07:59:18] Project: 6061 (Run 1, Clone 102, Gen 221)


[07:59:18] + Attempting to send results [July 5 07:59:18 UTC]
[07:59:18] - Reading file work/wuresults_06.dat from core
[07:59:18]   (Read 3791814 bytes from disk)
[07:59:18] Connecting to http://171.64.65.54:8080/
[08:00:00] Posted data.
[08:00:00] Initial: 0000; - Uploaded at ~88 kB/s
[08:00:00] - Averaged speed for that direction ~87 kB/s
[08:00:00] + Results successfully sent
[08:00:00] Thank you for your contribution to Folding@Home.
[08:00:00] + Number of Units Completed: 622

[08:00:04] Trying to send all finished work units
[08:00:04] + No unsent completed units remaining.
[08:00:04] - Preparing to get new work unit...
[08:00:04] Cleaning up work directory
[08:00:10] + Attempting to get work packet
[08:00:10] Passkey found
[08:00:10] - Will indicate memory of 4086 MB
[08:00:10] - Connecting to assignment server
[08:00:10] Connecting to http://assign.stanford.edu:8080/
[08:00:11] Posted data.
[08:00:11] Initial: 8F80; - Successful: assigned to (128.143.199.97).
[08:00:11] + News From Folding@Home: Welcome to Folding@Home
[08:00:11] Loaded queue successfully.
[08:00:11] Sent data
[08:00:11] Connecting to http://128.143.199.97:8080/
[08:00:12] Posted data.
[08:00:12] Initial: 0000; - Receiving payload (expected size: 1248809)
[08:00:16] - Downloaded at ~304 kB/s
[08:00:16] - Averaged speed for that direction ~516 kB/s
[08:00:16] + Received work.
[08:00:16] Trying to send all finished work units
[08:00:16] + No unsent completed units remaining.
[08:00:16] + Closed connections
[08:00:16] 
[08:00:16] + Processing work unit
[08:00:16] Core required: FahCore_a3.exe
[08:00:16] Core found.
[08:00:16] Working on queue slot 07 [July 5 08:00:16 UTC]
[08:00:16] + Working ...
[08:00:16] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 07 -np 7 -checkpoint 30 -forceasm -verbose -lifeline 4756 -version 634'

[08:00:16] 
[08:00:16] *------------------------------*
[08:00:16] Folding@Home Gromacs SMP Core
[08:00:16] Version 2.27 (Dec. 15, 2010)
[08:00:16] 
[08:00:16] Preparing to commence simulation
[08:00:16] - Assembly optimizations manually forced on.
[08:00:16] - Not checking prior termination.
[08:00:16] - Expanded 1248297 -> 2077012 (decompressed 166.3 percent)
[08:00:16] Called DecompressByteArray: compressed_data_size=1248297 data_size=2077012, decompressed_data_size=2077012 diff=0
[08:00:16] - Digital signature verified
[08:00:16] 
[08:00:16] Project: 7500 (Run 0, Clone 25, Gen 132)
[08:00:16] 
[08:00:16] Assembly optimizations on if available.
[08:00:16] Entering M.D.
[08:00:22] Mapping NT from 7 to 7 
[08:00:22] mdrun returned 255
[08:00:22] Going to send back what have done -- stepsTotalG=500000
[08:00:22] Work fraction=0.0000 steps=500000.
[08:00:26] logfile size=0 infoLength=0 edr=0 trr=25
[08:00:26] logfile size: 0 info=0 bed=0 hdr=25
[08:00:26] - Writing 642 bytes of core data to disk...
[08:00:26] Done: 130 -> 147 (compressed to 113.0 percent)
[08:00:26]   ... Done.
[08:00:26] 
[08:00:26] Folding@home Core Shutdown: EARLY_UNIT_END
[08:00:30] CoreStatus = 72 (114)
[08:00:30] Sending work to server
[08:00:30] Project: 7500 (Run 0, Clone 25, Gen 132)


[08:00:30] + Attempting to send results [July 5 08:00:30 UTC]
[08:00:30] - Reading file work/wuresults_07.dat from core
[08:00:30]   (Read 659 bytes from disk)
[08:00:30] Connecting to http://128.143.199.97:8080/
[08:00:30] Posted data.
[08:00:30] Initial: 0000; Conversation time very short, giving reduced weight in bandwidth avg
[08:00:30] - Uploaded at ~3 kB/s
[08:00:30] - Averaged speed for that direction ~77 kB/s
[08:00:30] + Results successfully sent
[08:00:30] Thank you for your contribution to Folding@Home.
[08:00:34] Trying to send all finished work units
[08:00:34] + No unsent completed units remaining.
[08:00:34] - Preparing to get new work unit...
[08:00:34] Cleaning up work directory
[08:00:40] + Attempting to get work packet
[08:00:40] Passkey found
[08:00:40] - Will indicate memory of 4086 MB
[08:00:40] - Connecting to assignment server
[08:00:40] Connecting to http://assign.stanford.edu:8080/
[08:00:41] Posted data.
[08:00:41] Initial: 8F80; - Successful: assigned to (128.143.199.97).
[08:00:41] + News From Folding@Home: Welcome to Folding@Home
[08:00:41] Loaded queue successfully.
[08:00:41] Sent data
[08:00:41] Connecting to http://128.143.199.97:8080/
[08:00:42] Posted data.
[08:00:42] Initial: 0000; - Receiving payload (expected size: 1248912)
[08:00:45] - Downloaded at ~406 kB/s
[08:00:45] - Averaged speed for that direction ~494 kB/s
[08:00:45] + Received work.
[08:00:45] Trying to send all finished work units
[08:00:45] + No unsent completed units remaining.
[08:00:45] + Closed connections
[08:00:50] 
[08:00:50] + Processing work unit
[08:00:50] Core required: FahCore_a3.exe
[08:00:50] Core found.
[08:00:50] Working on queue slot 08 [July 5 08:00:50 UTC]
[08:00:50] + Working ...
[08:00:50] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 08 -np 7 -checkpoint 30 -forceasm -verbose -lifeline 4756 -version 634'

[08:00:50] 
[08:00:50] *------------------------------*
[08:00:50] Folding@Home Gromacs SMP Core
[08:00:50] Version 2.27 (Dec. 15, 2010)
[08:00:50] 
[08:00:50] Preparing to commence simulation
[08:00:50] - Assembly optimizations manually forced on.
[08:00:50] - Not checking prior termination.
[08:00:51] - Expanded 1248400 -> 2077012 (decompressed 166.3 percent)
[08:00:51] Called DecompressByteArray: compressed_data_size=1248400 data_size=2077012, decompressed_data_size=2077012 diff=0
[08:00:51] - Digital signature verified
[08:00:51] 
[08:00:51] Project: 7500 (Run 0, Clone 75, Gen 120)
[08:00:51] 
[08:00:51] Assembly optimizations on if available.
[08:00:51] Entering M.D.
[08:00:57] Mapping NT from 7 to 7 
[08:00:57] mdrun returned 255
[08:00:57] Going to send back what have done -- stepsTotalG=500000
[08:00:57] Work fraction=0.0000 steps=500000.
[08:01:01] logfile size=0 infoLength=0 edr=0 trr=25
[08:01:01] logfile size: 0 info=0 bed=0 hdr=25
[08:01:01] - Writing 642 bytes of core data to disk...
[08:01:01] Done: 130 -> 147 (compressed to 113.0 percent)
[08:01:01]   ... Done.
[08:01:01] 
[08:01:01] Folding@home Core Shutdown: EARLY_UNIT_END
[08:01:04] CoreStatus = 72 (114)
[08:01:04] Sending work to server
[08:01:04] Project: 7500 (Run 0, Clone 75, Gen 120)


[08:01:04] + Attempting to send results [July 5 08:01:04 UTC]
[08:01:04] - Reading file work/wuresults_08.dat from core
[08:01:04]   (Read 659 bytes from disk)
[08:01:04] Connecting to http://128.143.199.97:8080/
[08:01:05] Posted data.
[08:01:05] Initial: 0000; - Uploaded at ~1 kB/s
[08:01:05] - Averaged speed for that direction ~62 kB/s
[08:01:05] + Results successfully sent
[08:01:05] Thank you for your contribution to Folding@Home.
[08:01:09] Trying to send all finished work units
[08:01:09] + No unsent completed units remaining.
[08:01:09] - Preparing to get new work unit...
[08:01:09] Cleaning up work directory
[08:01:14] + Attempting to get work packet
[08:01:14] Passkey found
[08:01:14] - Will indicate memory of 4086 MB
[08:01:14] - Connecting to assignment server
[08:01:14] Connecting to http://assign.stanford.edu:8080/
[08:01:15] Posted data.
[08:01:15] Initial: 8F80; - Successful: assigned to (128.143.199.97).
[08:01:15] + News From Folding@Home: Welcome to Folding@Home
[08:01:15] Loaded queue successfully.
[08:01:15] Sent data
[08:01:15] Connecting to http://128.143.199.97:8080/
[08:01:16] Posted data.
[08:01:16] Initial: 0000; - Receiving payload (expected size: 1254934)
[08:01:19] - Downloaded at ~408 kB/s
[08:01:19] - Averaged speed for that direction ~477 kB/s
[08:01:19] + Received work.
[08:01:19] Trying to send all finished work units
[08:01:19] + No unsent completed units remaining.
[08:01:19] + Closed connections
[08:01:24] 
[08:01:24] + Processing work unit
[08:01:24] Core required: FahCore_a3.exe
[08:01:24] Core found.
[08:01:24] Working on queue slot 09 [July 5 08:01:24 UTC]
[08:01:24] + Working ...
[08:01:24] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 09 -np 7 -checkpoint 30 -forceasm -verbose -lifeline 4756 -version 634'

[08:01:24] 
[08:01:24] *------------------------------*
[08:01:24] Folding@Home Gromacs SMP Core
[08:01:24] Version 2.27 (Dec. 15, 2010)
[08:01:24] 
[08:01:24] Preparing to commence simulation
[08:01:24] - Assembly optimizations manually forced on.
[08:01:24] - Not checking prior termination.
[08:01:25] - Expanded 1254422 -> 2077012 (decompressed 165.5 percent)
[08:01:25] Called DecompressByteArray: compressed_data_size=1254422 data_size=2077012, decompressed_data_size=2077012 diff=0
[08:01:25] - Digital signature verified
[08:01:25] 
[08:01:25] Project: 7501 (Run 0, Clone 114, Gen 31)
[08:01:25] 
[08:01:25] Assembly optimizations on if available.
[08:01:25] Entering M.D.
[08:01:31] Mapping NT from 7 to 7 
[08:01:31] mdrun returned 255
[08:01:31] Going to send back what have done -- stepsTotalG=500000
[08:01:31] Work fraction=0.0000 steps=500000.
[08:01:35] logfile size=0 infoLength=0 edr=0 trr=25
[08:01:35] logfile size: 0 info=0 bed=0 hdr=25
[08:01:35] - Writing 642 bytes of core data to disk...
[08:01:35] Done: 130 -> 147 (compressed to 113.0 percent)
[08:01:35]   ... Done.
[08:01:35] 
[08:01:35] Folding@home Core Shutdown: EARLY_UNIT_END
[08:01:39] CoreStatus = 72 (114)
[08:01:39] Sending work to server
[08:01:39] Project: 7501 (Run 0, Clone 114, Gen 31)


[08:01:39] + Attempting to send results [July 5 08:01:39 UTC]
[08:01:39] - Reading file work/wuresults_09.dat from core
[08:01:39]   (Read 659 bytes from disk)
[08:01:39] Connecting to http://128.143.199.97:8080/
[08:01:39] Posted data.
[08:01:39] Initial: 0000; Conversation time very short, giving reduced weight in bandwidth avg
[08:01:39] - Uploaded at ~3 kB/s
[08:01:39] - Averaged speed for that direction ~56 kB/s
[08:01:39] + Results successfully sent
[08:01:39] Thank you for your contribution to Folding@Home.
[08:01:43] Trying to send all finished work units
[08:01:43] + No unsent completed units remaining.
[08:01:43] - Preparing to get new work unit...
[08:01:43] Cleaning up work directory
[08:01:44] + Attempting to get work packet
[08:01:44] Passkey found
[08:01:44] - Will indicate memory of 4086 MB
[08:01:44] - Connecting to assignment server
[08:01:44] Connecting to http://assign.stanford.edu:8080/
[08:01:44] Posted data.
[08:01:44] Initial: 8F80; - Successful: assigned to (128.143.199.97).
[08:01:44] + News From Folding@Home: Welcome to Folding@Home
[08:01:44] Loaded queue successfully.
[08:01:44] Sent data
[08:01:44] Connecting to http://128.143.199.97:8080/
[08:01:45] Posted data.
[08:01:45] Initial: 0000; - Receiving payload (expected size: 1253800)
[08:01:48] - Downloaded at ~408 kB/s
[08:01:48] - Averaged speed for that direction ~463 kB/s
[08:01:48] + Received work.
[08:01:48] Trying to send all finished work units
[08:01:48] + No unsent completed units remaining.
[08:01:48] + Closed connections
[08:01:53] 
[08:01:53] + Processing work unit
[08:01:53] Core required: FahCore_a3.exe
[08:01:53] Core found.
[08:01:53] Working on queue slot 00 [July 5 08:01:53 UTC]
[08:01:53] + Working ...
[08:01:53] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 00 -np 7 -checkpoint 30 -forceasm -verbose -lifeline 4756 -version 634'

[08:01:53] 
[08:01:53] *------------------------------*
[08:01:53] Folding@Home Gromacs SMP Core
[08:01:53] Version 2.27 (Dec. 15, 2010)
[08:01:53] 
[08:01:53] Preparing to commence simulation
[08:01:53] - Assembly optimizations manually forced on.
[08:01:53] - Not checking prior termination.
[08:01:54] - Expanded 1253288 -> 2077012 (decompressed 165.7 percent)
[08:01:54] Called DecompressByteArray: compressed_data_size=1253288 data_size=2077012, decompressed_data_size=2077012 diff=0
[08:01:54] - Digital signature verified
[08:01:54] 
[08:01:54] Project: 7501 (Run 0, Clone 135, Gen 30)
[08:01:54] 
[08:01:54] Assembly optimizations on if available.
[08:01:54] Entering M.D.
[08:02:00] Mapping NT from 7 to 7 
[08:02:00] mdrun returned 255
[08:02:00] Going to send back what have done -- stepsTotalG=500000
[08:02:00] Work fraction=0.0000 steps=500000.
[08:02:04] logfile size=0 infoLength=0 edr=0 trr=25
[08:02:04] logfile size: 0 info=0 bed=0 hdr=25
[08:02:04] - Writing 642 bytes of core data to disk...
[08:02:04] Done: 130 -> 151 (compressed to 116.1 percent)
[08:02:04]   ... Done.
[08:02:04] 
[08:02:04] Folding@home Core Shutdown: EARLY_UNIT_END
[08:02:07] CoreStatus = 72 (114)
[08:02:07] Sending work to server
[08:02:07] Project: 7501 (Run 0, Clone 135, Gen 30)


[08:02:07] + Attempting to send results [July 5 08:02:07 UTC]
[08:02:07] - Reading file work/wuresults_00.dat from core
[08:02:07]   (Read 663 bytes from disk)
[08:02:07] Connecting to http://128.143.199.97:8080/
[08:02:08] Posted data.
[08:02:08] Initial: 0000; - Uploaded at ~1 kB/s
[08:02:08] - Averaged speed for that direction ~45 kB/s
[08:02:08] + Results successfully sent
[08:02:08] Thank you for your contribution to Folding@Home.
[08:02:12] Trying to send all finished work units
[08:02:12] + No unsent completed units remaining.
[08:02:12] - Preparing to get new work unit...
[08:02:12] Cleaning up work directory
[08:02:12] + Attempting to get work packet
[08:02:12] Passkey found
[08:02:12] - Will indicate memory of 4086 MB
[08:02:12] - Connecting to assignment server
[08:02:12] Connecting to http://assign.stanford.edu:8080/
[08:02:13] Posted data.
[08:02:13] Initial: 8F80; - Successful: assigned to (128.143.199.97).
[08:02:13] + News From Folding@Home: Welcome to Folding@Home
[08:02:13] Loaded queue successfully.
[08:02:13] Sent data
[08:02:13] Connecting to http://128.143.199.97:8080/
[08:02:14] Posted data.
[08:02:14] Initial: 0000; - Receiving payload (expected size: 1247807)
[08:02:17] - Downloaded at ~406 kB/s
[08:02:17] - Averaged speed for that direction ~452 kB/s
[08:02:17] + Received work.
[08:02:17] Trying to send all finished work units
[08:02:17] + No unsent completed units remaining.
[08:02:17] + Closed connections
[08:02:22] 
[08:02:22] + Processing work unit
[08:02:22] Core required: FahCore_a3.exe
[08:02:22] Core found.
[08:02:22] Working on queue slot 01 [July 5 08:02:22 UTC]
[08:02:22] + Working ...
[08:02:22] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 01 -np 7 -checkpoint 30 -forceasm -verbose -lifeline 4756 -version 634'

[08:02:22] 
[08:02:22] *------------------------------*
[08:02:22] Folding@Home Gromacs SMP Core
[08:02:22] Version 2.27 (Dec. 15, 2010)
[08:02:22] 
[08:02:22] Preparing to commence simulation
[08:02:22] - Assembly optimizations manually forced on.
[08:02:22] - Not checking prior termination.
[08:02:22] - Expanded 1247295 -> 2077012 (decompressed 166.5 percent)
[08:02:22] Called DecompressByteArray: compressed_data_size=1247295 data_size=2077012, decompressed_data_size=2077012 diff=0
[08:02:22] - Digital signature verified
[08:02:22] 
[08:02:22] Project: 7500 (Run 0, Clone 86, Gen 117)
[08:02:22] 
[08:02:22] Assembly optimizations on if available.
[08:02:22] Entering M.D.
[08:02:28] Mapping NT from 7 to 7 
[08:02:28] mdrun returned 255
[08:02:28] Going to send back what have done -- stepsTotalG=500000
[08:02:28] Work fraction=0.0000 steps=500000.
[08:02:32] logfile size=0 infoLength=0 edr=0 trr=25
[08:02:32] logfile size: 0 info=0 bed=0 hdr=25
[08:02:32] - Writing 642 bytes of core data to disk...
[08:02:32] Done: 130 -> 148 (compressed to 113.8 percent)
[08:02:32]   ... Done.
[08:02:32] 
[08:02:32] Folding@home Core Shutdown: EARLY_UNIT_END
[08:02:36] CoreStatus = 72 (114)
[08:02:36] Sending work to server
[08:02:36] Project: 7500 (Run 0, Clone 86, Gen 117)
If more of the log is needed, just let me know.

Re: EUEs, multiple, on Project 7500 and 7501 work units

Posted: Tue Jul 05, 2011 8:27 pm
by Grandpa_01
7500 and 7501 do not work with -smp 7 you should be using -smp 8 or -smp 6, there are several WU,s that will not work with -smp 7

Re: EUEs, multiple, on Project 7500 and 7501 work units

Posted: Wed Jul 06, 2011 7:22 am
by toTOW
Indeed ©

Re: EUEs, multiple, on Project 7500 and 7501 work units

Posted: Thu Jul 07, 2011 10:33 pm
by Leonardo
Well, that explains it. Hmm, I thought the forced 8-thread matter was limited to V7. Obviously I was wrong.

Re: EUEs, multiple, on Project 7500 and 7501 work units

Posted: Fri Jul 08, 2011 12:41 am
by Grandpa_01
No it is not limited to any version of FAH and it affects core a3, a4, and a5 non prime core counts.

Re: EUEs, multiple, on Project 7500 and 7501 work units

Posted: Fri Jul 08, 2011 12:55 am
by Leonardo
Are projects 7500 and 7501 the only projects that lock out odd number SMP flats, a la -SMP 7?

I knew that the odd number lock out (even number enforcement) was in place for client V7, but as you can see, it was a surprise that it had reached 6.34.

I may have to scramble for some solutions, as I'm also running multiple GPU clients with each SMP/bigadv machine.

Re: EUEs, multiple, on Project 7500 and 7501 work units

Posted: Fri Jul 08, 2011 1:02 am
by Grandpa_01
No there are several others but I can not remember which ones they were they were a3's and some of the bigadv can not run -smp 11 or 23

The solution is -smp 6

There have been quit a few post about it you can find them here. search.php?keywords=Instant+EUE

Re: EUEs, multiple, on Project 7500 and 7501 work units

Posted: Fri Jul 08, 2011 1:45 am
by Leonardo
Thanks. Unfortunate about this. Were I to run -SMP 6, I would have one core in each machine at idle.
I'll check out the link.

Re: EUEs, multiple, on Project 7500 and 7501 work units

Posted: Fri Jul 08, 2011 1:53 am
by bruce
Thank you for your report, Leonardo.
Leonardo wrote:Well, that explains it. Hmm, I thought the forced 8-thread matter was limited to V7. Obviously I was wrong.
You're talking about two different things. One is whether FAH enforces certain limitations on your choices and the other is the limitations that are inherent in the current Gromacs code when running certain proteins.

The GROMACS code is and open-source development of a group at gromacs.org. It's specifically designed for researchers who dedicate their own computer to their research. Even though the Pande Group has adapted it to be used in a distributed computing environment (and has made other contributions to that development) it's primary focus is on dedicated machines.

There's no such thing as a dedicated computer with 7 cores so we need to recognize the fact that donors are taking it on themselves to use it in untested ways when the run it as -smp 7 and it's always possible to run into unexpected limitations such as an increased failure rate. Such a configuration has never been recommended by either gromacs.org or by stanford and it may or may not work.

It's a very useful analysis method when using easily factorable numbers that occur in real hardware like 4, 8, 12, 16, 24, 32 etc. Many other values may or may not work, depending on the atomic structure being analyzed. When donors choose to run with less thoroughly tested numbers and make reports like yours Stanford can exclude those problems by limiting donor choices to configurations that are known to work or to ones for which few reports have been collected. The fact that this problem didn't get discovered when 7500 was being beta tested probably just means that the beta testers were running with other values.

Stanford did start imposing limitations in V7 but since the gromacs code is identical in both V6 and V7 it made more sense to put the exclusions in the core rather than in the client(s). If Stanford has not excluded 7 yet, they probably will soon.

The FAH research is still very much cutting-edge technology and they can only fix the problems that they know about.
Leonardo wrote:Thanks. Unfortunate about this. Were I to run -SMP 6, I would have one core in each machine at idle.
That's only true if you choose to run it that way. You can always run a uniprocessor client concurrently with a smp 6 client or run a smp 8 client without whatever you're running in the 8th core -- or you can continue to run with smp 7 (at least until it's excluded) and accept the fact that it's going to be less reliable, at times, even though that would be a poor choice.

Re: EUEs, multiple, on Project 7500 and 7501 work units

Posted: Fri Jul 08, 2011 2:13 am
by Leonardo
Yes, Bruce. Perhaps I was being cryptic. I understood all that, I was just sloppy in my wording. Also, I just didn't know that the non-prime core count limitations had it had yet come into effect already for some projects processed by 4-core/8-thread processors.
If Stanford has not excluded 7 yet, they probably will soon.
Again, yes, I knew that, but it still took me by surprise. :lol: :shock:

I'll find a solution that complies with best practices. Always have, always will. :D

Re: EUEs, multiple, on Project 7500 and 7501 work units

Posted: Fri Jul 08, 2011 2:52 am
by Grandpa_01
Actually it was discovered in beta testing by me. :oops: viewtopic.php?f=66&t=18549#p185850

Re: EUEs, multiple, on Project 7500 and 7501 work units

Posted: Fri Jul 08, 2011 10:52 am
by bruce
Grandpa_01 wrote:Actually it was discovered in beta testing by me. :oops: viewtopic.php?f=66&t=18549#p185850
True, but I still regard the beta forum as something than not everybody reads. Stanford is aware of the issue but I don't know what their plan is to deal with it. I didn't see an official warning about smp 7 on those projects. I suspect that in their mind, they've already warned donors against using number like smp 7 and it doesn't need to be said again -- in spite of the fact that a lot of people have run smp 7 for a long time and generally have had few difficulties.

When the Mods designate a particular WU to be a "bad WU" and suspend future generations of that trajectory, we do not know how many threads were used when the errors occurred. (e.g. - - Were all of the bad ones 7 and all of the successes 8?) I don't see an easy way to solve this dilemma.

Re: EUEs, multiple, on Project 7500 and 7501 work units

Posted: Fri Jul 08, 2011 2:31 pm
by Grandpa_01
I never have figured out why there never has been an official announcement about running -smp 7. I would say the majority of people that run -smp and GPU at the same time use -smp 7 flag. My guess is they are going to just implement the fix in v7 that does not allow the use of it.

Re: EUEs, multiple, on Project 7500 and 7501 work units

Posted: Fri Jul 08, 2011 9:14 pm
by Mactin
Project 7500 also instantly EUE's with smp -10.
and '10' is not a prime number as far as I know.

Re: EUEs, multiple, on Project 7500 and 7501 work units

Posted: Sat Jul 09, 2011 8:09 am
by kasson
No, but 10 = 5x2. So you're probably having a 5 issue. 12 and 8 have better factorizations, so you're much safer there.

In our opinion, -smp 7 is generally a risky idea (not recommended). 10 is usually safer, but 6,8,12,16,20,24 are better. (You might wonder about 20 = 5x4, but some of the -smp sensitive projects such as 7500 don't assign that high.)