EUEs, multiple, on Project 7500 and 7501 work units

Moderators: Site Moderators, FAHC Science Team

Leonardo
Posts: 260
Joined: Tue Dec 04, 2007 5:09 am
Hardware configuration: GPU slots on home-built, purpose-built PCs.
Location: Eagle River, Alaska

EUEs, multiple, on Project 7500 and 7501 work units

Post by Leonardo »

I did not think it efficient to start a separate thread on each failed WU, sore here is a compilation a disaster earlier today with a series of projects 7500 and 7501s.

Instant EUEs, 24 times, in a row, same machine, for the following:

Project: 7501 (Run 0, Clone 114, Gen 31)
Project: 7500 (Run 0, Clone 25, Gen 132)
Project: 7500 (Run 0, Clone 75, Gen 120)
Project: 7501 (Run 0, Clone 135, Gen 30)
Project 7500 (Run 0, Clone 86, Gen 117)
Project: 7501 (Run 0, Clone 131, Gen 30)
Project: 7500 (Run 0, Clone 100, Gen 112)
Project: 7501 (Run 0, Clone 156, Gen 29)
Project: 7500 (Run 0, Clone 36, Gen 98)
Project: 7501 (Run 0, Clone 157, Gen 29)
Project: 7500 (Run 0, Clone 94, Gen 97)
Project: 7500 (Run 0, Clone 33, Gen 105)
Project: 7500 (Run 0, Clone 37, Gen 96)
Project: 7501 (Run 0, Clone 176, Gen 29)
Project: 7500 (Run 0, Clone 68, Gen 89)
Project: 7501 (Run 0, Clone 261, Gen 29)
Project: 7500 (Run 0, Clone 157, Gen 76)
....and more.... log is below

Computer: Win7/64, Client 6.34
The machine has been running SMP2 and SMP -bigadv for about nine months reliably at the same settings. The client was completing all units perfectly before this group of instant EUEs, and resumed perfect performance when a 6XXX was downloaded. I have no explanation. Could it be a hardware issue? Perhaps, but it doesn't seem logical, as the computer crunched work units perfectly before and after these failures with not hardware or client settings changes. Computer was running relatively cool throughout.

Does this count against my ratio of successfully completed SMP units for -bigadv bonuses? If so, I might be in a world of hurt, here. :lol: :shock: :|

Code: Select all

[07:59:02] Completed 500000 out of 500000 steps  (100%)
[07:59:04] DynamicWrapper: Finished Work Unit: sleep=10000
[07:59:14] 
[07:59:14] Finished Work Unit:
[07:59:14] - Reading up to 3700368 from "work/wudata_06.trr": Read 3700368
[07:59:14] trr file hash check passed.
[07:59:14] edr file hash check passed.
[07:59:14] logfile size: 55486
[07:59:14] Leaving Run
[07:59:14] - Writing 3791814 bytes of core data to disk...
[07:59:14]   ... Done.
[07:59:15] - Shutting down core
[07:59:15] 
[07:59:15] Folding@home Core Shutdown: FINISHED_UNIT
[07:59:18] CoreStatus = 64 (100)
[07:59:18] Unit 6 finished with 96 percent of time to deadline remaining.
[07:59:18] Updated performance fraction: 0.857175
[07:59:18] Sending work to server
[07:59:18] Project: 6061 (Run 1, Clone 102, Gen 221)


[07:59:18] + Attempting to send results [July 5 07:59:18 UTC]
[07:59:18] - Reading file work/wuresults_06.dat from core
[07:59:18]   (Read 3791814 bytes from disk)
[07:59:18] Connecting to http://171.64.65.54:8080/
[08:00:00] Posted data.
[08:00:00] Initial: 0000; - Uploaded at ~88 kB/s
[08:00:00] - Averaged speed for that direction ~87 kB/s
[08:00:00] + Results successfully sent
[08:00:00] Thank you for your contribution to Folding@Home.
[08:00:00] + Number of Units Completed: 622

[08:00:04] Trying to send all finished work units
[08:00:04] + No unsent completed units remaining.
[08:00:04] - Preparing to get new work unit...
[08:00:04] Cleaning up work directory
[08:00:10] + Attempting to get work packet
[08:00:10] Passkey found
[08:00:10] - Will indicate memory of 4086 MB
[08:00:10] - Connecting to assignment server
[08:00:10] Connecting to http://assign.stanford.edu:8080/
[08:00:11] Posted data.
[08:00:11] Initial: 8F80; - Successful: assigned to (128.143.199.97).
[08:00:11] + News From Folding@Home: Welcome to Folding@Home
[08:00:11] Loaded queue successfully.
[08:00:11] Sent data
[08:00:11] Connecting to http://128.143.199.97:8080/
[08:00:12] Posted data.
[08:00:12] Initial: 0000; - Receiving payload (expected size: 1248809)
[08:00:16] - Downloaded at ~304 kB/s
[08:00:16] - Averaged speed for that direction ~516 kB/s
[08:00:16] + Received work.
[08:00:16] Trying to send all finished work units
[08:00:16] + No unsent completed units remaining.
[08:00:16] + Closed connections
[08:00:16] 
[08:00:16] + Processing work unit
[08:00:16] Core required: FahCore_a3.exe
[08:00:16] Core found.
[08:00:16] Working on queue slot 07 [July 5 08:00:16 UTC]
[08:00:16] + Working ...
[08:00:16] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 07 -np 7 -checkpoint 30 -forceasm -verbose -lifeline 4756 -version 634'

[08:00:16] 
[08:00:16] *------------------------------*
[08:00:16] Folding@Home Gromacs SMP Core
[08:00:16] Version 2.27 (Dec. 15, 2010)
[08:00:16] 
[08:00:16] Preparing to commence simulation
[08:00:16] - Assembly optimizations manually forced on.
[08:00:16] - Not checking prior termination.
[08:00:16] - Expanded 1248297 -> 2077012 (decompressed 166.3 percent)
[08:00:16] Called DecompressByteArray: compressed_data_size=1248297 data_size=2077012, decompressed_data_size=2077012 diff=0
[08:00:16] - Digital signature verified
[08:00:16] 
[08:00:16] Project: 7500 (Run 0, Clone 25, Gen 132)
[08:00:16] 
[08:00:16] Assembly optimizations on if available.
[08:00:16] Entering M.D.
[08:00:22] Mapping NT from 7 to 7 
[08:00:22] mdrun returned 255
[08:00:22] Going to send back what have done -- stepsTotalG=500000
[08:00:22] Work fraction=0.0000 steps=500000.
[08:00:26] logfile size=0 infoLength=0 edr=0 trr=25
[08:00:26] logfile size: 0 info=0 bed=0 hdr=25
[08:00:26] - Writing 642 bytes of core data to disk...
[08:00:26] Done: 130 -> 147 (compressed to 113.0 percent)
[08:00:26]   ... Done.
[08:00:26] 
[08:00:26] Folding@home Core Shutdown: EARLY_UNIT_END
[08:00:30] CoreStatus = 72 (114)
[08:00:30] Sending work to server
[08:00:30] Project: 7500 (Run 0, Clone 25, Gen 132)


[08:00:30] + Attempting to send results [July 5 08:00:30 UTC]
[08:00:30] - Reading file work/wuresults_07.dat from core
[08:00:30]   (Read 659 bytes from disk)
[08:00:30] Connecting to http://128.143.199.97:8080/
[08:00:30] Posted data.
[08:00:30] Initial: 0000; Conversation time very short, giving reduced weight in bandwidth avg
[08:00:30] - Uploaded at ~3 kB/s
[08:00:30] - Averaged speed for that direction ~77 kB/s
[08:00:30] + Results successfully sent
[08:00:30] Thank you for your contribution to Folding@Home.
[08:00:34] Trying to send all finished work units
[08:00:34] + No unsent completed units remaining.
[08:00:34] - Preparing to get new work unit...
[08:00:34] Cleaning up work directory
[08:00:40] + Attempting to get work packet
[08:00:40] Passkey found
[08:00:40] - Will indicate memory of 4086 MB
[08:00:40] - Connecting to assignment server
[08:00:40] Connecting to http://assign.stanford.edu:8080/
[08:00:41] Posted data.
[08:00:41] Initial: 8F80; - Successful: assigned to (128.143.199.97).
[08:00:41] + News From Folding@Home: Welcome to Folding@Home
[08:00:41] Loaded queue successfully.
[08:00:41] Sent data
[08:00:41] Connecting to http://128.143.199.97:8080/
[08:00:42] Posted data.
[08:00:42] Initial: 0000; - Receiving payload (expected size: 1248912)
[08:00:45] - Downloaded at ~406 kB/s
[08:00:45] - Averaged speed for that direction ~494 kB/s
[08:00:45] + Received work.
[08:00:45] Trying to send all finished work units
[08:00:45] + No unsent completed units remaining.
[08:00:45] + Closed connections
[08:00:50] 
[08:00:50] + Processing work unit
[08:00:50] Core required: FahCore_a3.exe
[08:00:50] Core found.
[08:00:50] Working on queue slot 08 [July 5 08:00:50 UTC]
[08:00:50] + Working ...
[08:00:50] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 08 -np 7 -checkpoint 30 -forceasm -verbose -lifeline 4756 -version 634'

[08:00:50] 
[08:00:50] *------------------------------*
[08:00:50] Folding@Home Gromacs SMP Core
[08:00:50] Version 2.27 (Dec. 15, 2010)
[08:00:50] 
[08:00:50] Preparing to commence simulation
[08:00:50] - Assembly optimizations manually forced on.
[08:00:50] - Not checking prior termination.
[08:00:51] - Expanded 1248400 -> 2077012 (decompressed 166.3 percent)
[08:00:51] Called DecompressByteArray: compressed_data_size=1248400 data_size=2077012, decompressed_data_size=2077012 diff=0
[08:00:51] - Digital signature verified
[08:00:51] 
[08:00:51] Project: 7500 (Run 0, Clone 75, Gen 120)
[08:00:51] 
[08:00:51] Assembly optimizations on if available.
[08:00:51] Entering M.D.
[08:00:57] Mapping NT from 7 to 7 
[08:00:57] mdrun returned 255
[08:00:57] Going to send back what have done -- stepsTotalG=500000
[08:00:57] Work fraction=0.0000 steps=500000.
[08:01:01] logfile size=0 infoLength=0 edr=0 trr=25
[08:01:01] logfile size: 0 info=0 bed=0 hdr=25
[08:01:01] - Writing 642 bytes of core data to disk...
[08:01:01] Done: 130 -> 147 (compressed to 113.0 percent)
[08:01:01]   ... Done.
[08:01:01] 
[08:01:01] Folding@home Core Shutdown: EARLY_UNIT_END
[08:01:04] CoreStatus = 72 (114)
[08:01:04] Sending work to server
[08:01:04] Project: 7500 (Run 0, Clone 75, Gen 120)


[08:01:04] + Attempting to send results [July 5 08:01:04 UTC]
[08:01:04] - Reading file work/wuresults_08.dat from core
[08:01:04]   (Read 659 bytes from disk)
[08:01:04] Connecting to http://128.143.199.97:8080/
[08:01:05] Posted data.
[08:01:05] Initial: 0000; - Uploaded at ~1 kB/s
[08:01:05] - Averaged speed for that direction ~62 kB/s
[08:01:05] + Results successfully sent
[08:01:05] Thank you for your contribution to Folding@Home.
[08:01:09] Trying to send all finished work units
[08:01:09] + No unsent completed units remaining.
[08:01:09] - Preparing to get new work unit...
[08:01:09] Cleaning up work directory
[08:01:14] + Attempting to get work packet
[08:01:14] Passkey found
[08:01:14] - Will indicate memory of 4086 MB
[08:01:14] - Connecting to assignment server
[08:01:14] Connecting to http://assign.stanford.edu:8080/
[08:01:15] Posted data.
[08:01:15] Initial: 8F80; - Successful: assigned to (128.143.199.97).
[08:01:15] + News From Folding@Home: Welcome to Folding@Home
[08:01:15] Loaded queue successfully.
[08:01:15] Sent data
[08:01:15] Connecting to http://128.143.199.97:8080/
[08:01:16] Posted data.
[08:01:16] Initial: 0000; - Receiving payload (expected size: 1254934)
[08:01:19] - Downloaded at ~408 kB/s
[08:01:19] - Averaged speed for that direction ~477 kB/s
[08:01:19] + Received work.
[08:01:19] Trying to send all finished work units
[08:01:19] + No unsent completed units remaining.
[08:01:19] + Closed connections
[08:01:24] 
[08:01:24] + Processing work unit
[08:01:24] Core required: FahCore_a3.exe
[08:01:24] Core found.
[08:01:24] Working on queue slot 09 [July 5 08:01:24 UTC]
[08:01:24] + Working ...
[08:01:24] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 09 -np 7 -checkpoint 30 -forceasm -verbose -lifeline 4756 -version 634'

[08:01:24] 
[08:01:24] *------------------------------*
[08:01:24] Folding@Home Gromacs SMP Core
[08:01:24] Version 2.27 (Dec. 15, 2010)
[08:01:24] 
[08:01:24] Preparing to commence simulation
[08:01:24] - Assembly optimizations manually forced on.
[08:01:24] - Not checking prior termination.
[08:01:25] - Expanded 1254422 -> 2077012 (decompressed 165.5 percent)
[08:01:25] Called DecompressByteArray: compressed_data_size=1254422 data_size=2077012, decompressed_data_size=2077012 diff=0
[08:01:25] - Digital signature verified
[08:01:25] 
[08:01:25] Project: 7501 (Run 0, Clone 114, Gen 31)
[08:01:25] 
[08:01:25] Assembly optimizations on if available.
[08:01:25] Entering M.D.
[08:01:31] Mapping NT from 7 to 7 
[08:01:31] mdrun returned 255
[08:01:31] Going to send back what have done -- stepsTotalG=500000
[08:01:31] Work fraction=0.0000 steps=500000.
[08:01:35] logfile size=0 infoLength=0 edr=0 trr=25
[08:01:35] logfile size: 0 info=0 bed=0 hdr=25
[08:01:35] - Writing 642 bytes of core data to disk...
[08:01:35] Done: 130 -> 147 (compressed to 113.0 percent)
[08:01:35]   ... Done.
[08:01:35] 
[08:01:35] Folding@home Core Shutdown: EARLY_UNIT_END
[08:01:39] CoreStatus = 72 (114)
[08:01:39] Sending work to server
[08:01:39] Project: 7501 (Run 0, Clone 114, Gen 31)


[08:01:39] + Attempting to send results [July 5 08:01:39 UTC]
[08:01:39] - Reading file work/wuresults_09.dat from core
[08:01:39]   (Read 659 bytes from disk)
[08:01:39] Connecting to http://128.143.199.97:8080/
[08:01:39] Posted data.
[08:01:39] Initial: 0000; Conversation time very short, giving reduced weight in bandwidth avg
[08:01:39] - Uploaded at ~3 kB/s
[08:01:39] - Averaged speed for that direction ~56 kB/s
[08:01:39] + Results successfully sent
[08:01:39] Thank you for your contribution to Folding@Home.
[08:01:43] Trying to send all finished work units
[08:01:43] + No unsent completed units remaining.
[08:01:43] - Preparing to get new work unit...
[08:01:43] Cleaning up work directory
[08:01:44] + Attempting to get work packet
[08:01:44] Passkey found
[08:01:44] - Will indicate memory of 4086 MB
[08:01:44] - Connecting to assignment server
[08:01:44] Connecting to http://assign.stanford.edu:8080/
[08:01:44] Posted data.
[08:01:44] Initial: 8F80; - Successful: assigned to (128.143.199.97).
[08:01:44] + News From Folding@Home: Welcome to Folding@Home
[08:01:44] Loaded queue successfully.
[08:01:44] Sent data
[08:01:44] Connecting to http://128.143.199.97:8080/
[08:01:45] Posted data.
[08:01:45] Initial: 0000; - Receiving payload (expected size: 1253800)
[08:01:48] - Downloaded at ~408 kB/s
[08:01:48] - Averaged speed for that direction ~463 kB/s
[08:01:48] + Received work.
[08:01:48] Trying to send all finished work units
[08:01:48] + No unsent completed units remaining.
[08:01:48] + Closed connections
[08:01:53] 
[08:01:53] + Processing work unit
[08:01:53] Core required: FahCore_a3.exe
[08:01:53] Core found.
[08:01:53] Working on queue slot 00 [July 5 08:01:53 UTC]
[08:01:53] + Working ...
[08:01:53] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 00 -np 7 -checkpoint 30 -forceasm -verbose -lifeline 4756 -version 634'

[08:01:53] 
[08:01:53] *------------------------------*
[08:01:53] Folding@Home Gromacs SMP Core
[08:01:53] Version 2.27 (Dec. 15, 2010)
[08:01:53] 
[08:01:53] Preparing to commence simulation
[08:01:53] - Assembly optimizations manually forced on.
[08:01:53] - Not checking prior termination.
[08:01:54] - Expanded 1253288 -> 2077012 (decompressed 165.7 percent)
[08:01:54] Called DecompressByteArray: compressed_data_size=1253288 data_size=2077012, decompressed_data_size=2077012 diff=0
[08:01:54] - Digital signature verified
[08:01:54] 
[08:01:54] Project: 7501 (Run 0, Clone 135, Gen 30)
[08:01:54] 
[08:01:54] Assembly optimizations on if available.
[08:01:54] Entering M.D.
[08:02:00] Mapping NT from 7 to 7 
[08:02:00] mdrun returned 255
[08:02:00] Going to send back what have done -- stepsTotalG=500000
[08:02:00] Work fraction=0.0000 steps=500000.
[08:02:04] logfile size=0 infoLength=0 edr=0 trr=25
[08:02:04] logfile size: 0 info=0 bed=0 hdr=25
[08:02:04] - Writing 642 bytes of core data to disk...
[08:02:04] Done: 130 -> 151 (compressed to 116.1 percent)
[08:02:04]   ... Done.
[08:02:04] 
[08:02:04] Folding@home Core Shutdown: EARLY_UNIT_END
[08:02:07] CoreStatus = 72 (114)
[08:02:07] Sending work to server
[08:02:07] Project: 7501 (Run 0, Clone 135, Gen 30)


[08:02:07] + Attempting to send results [July 5 08:02:07 UTC]
[08:02:07] - Reading file work/wuresults_00.dat from core
[08:02:07]   (Read 663 bytes from disk)
[08:02:07] Connecting to http://128.143.199.97:8080/
[08:02:08] Posted data.
[08:02:08] Initial: 0000; - Uploaded at ~1 kB/s
[08:02:08] - Averaged speed for that direction ~45 kB/s
[08:02:08] + Results successfully sent
[08:02:08] Thank you for your contribution to Folding@Home.
[08:02:12] Trying to send all finished work units
[08:02:12] + No unsent completed units remaining.
[08:02:12] - Preparing to get new work unit...
[08:02:12] Cleaning up work directory
[08:02:12] + Attempting to get work packet
[08:02:12] Passkey found
[08:02:12] - Will indicate memory of 4086 MB
[08:02:12] - Connecting to assignment server
[08:02:12] Connecting to http://assign.stanford.edu:8080/
[08:02:13] Posted data.
[08:02:13] Initial: 8F80; - Successful: assigned to (128.143.199.97).
[08:02:13] + News From Folding@Home: Welcome to Folding@Home
[08:02:13] Loaded queue successfully.
[08:02:13] Sent data
[08:02:13] Connecting to http://128.143.199.97:8080/
[08:02:14] Posted data.
[08:02:14] Initial: 0000; - Receiving payload (expected size: 1247807)
[08:02:17] - Downloaded at ~406 kB/s
[08:02:17] - Averaged speed for that direction ~452 kB/s
[08:02:17] + Received work.
[08:02:17] Trying to send all finished work units
[08:02:17] + No unsent completed units remaining.
[08:02:17] + Closed connections
[08:02:22] 
[08:02:22] + Processing work unit
[08:02:22] Core required: FahCore_a3.exe
[08:02:22] Core found.
[08:02:22] Working on queue slot 01 [July 5 08:02:22 UTC]
[08:02:22] + Working ...
[08:02:22] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 01 -np 7 -checkpoint 30 -forceasm -verbose -lifeline 4756 -version 634'

[08:02:22] 
[08:02:22] *------------------------------*
[08:02:22] Folding@Home Gromacs SMP Core
[08:02:22] Version 2.27 (Dec. 15, 2010)
[08:02:22] 
[08:02:22] Preparing to commence simulation
[08:02:22] - Assembly optimizations manually forced on.
[08:02:22] - Not checking prior termination.
[08:02:22] - Expanded 1247295 -> 2077012 (decompressed 166.5 percent)
[08:02:22] Called DecompressByteArray: compressed_data_size=1247295 data_size=2077012, decompressed_data_size=2077012 diff=0
[08:02:22] - Digital signature verified
[08:02:22] 
[08:02:22] Project: 7500 (Run 0, Clone 86, Gen 117)
[08:02:22] 
[08:02:22] Assembly optimizations on if available.
[08:02:22] Entering M.D.
[08:02:28] Mapping NT from 7 to 7 
[08:02:28] mdrun returned 255
[08:02:28] Going to send back what have done -- stepsTotalG=500000
[08:02:28] Work fraction=0.0000 steps=500000.
[08:02:32] logfile size=0 infoLength=0 edr=0 trr=25
[08:02:32] logfile size: 0 info=0 bed=0 hdr=25
[08:02:32] - Writing 642 bytes of core data to disk...
[08:02:32] Done: 130 -> 148 (compressed to 113.8 percent)
[08:02:32]   ... Done.
[08:02:32] 
[08:02:32] Folding@home Core Shutdown: EARLY_UNIT_END
[08:02:36] CoreStatus = 72 (114)
[08:02:36] Sending work to server
[08:02:36] Project: 7500 (Run 0, Clone 86, Gen 117)
If more of the log is needed, just let me know.
Image
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: EUEs, multiple, on Project 7500 and 7501 work units

Post by Grandpa_01 »

7500 and 7501 do not work with -smp 7 you should be using -smp 8 or -smp 6, there are several WU,s that will not work with -smp 7
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
toTOW
Site Moderator
Posts: 6455
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: EUEs, multiple, on Project 7500 and 7501 work units

Post by toTOW »

Indeed ©
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Leonardo
Posts: 260
Joined: Tue Dec 04, 2007 5:09 am
Hardware configuration: GPU slots on home-built, purpose-built PCs.
Location: Eagle River, Alaska

Re: EUEs, multiple, on Project 7500 and 7501 work units

Post by Leonardo »

Well, that explains it. Hmm, I thought the forced 8-thread matter was limited to V7. Obviously I was wrong.
Image
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: EUEs, multiple, on Project 7500 and 7501 work units

Post by Grandpa_01 »

No it is not limited to any version of FAH and it affects core a3, a4, and a5 non prime core counts.
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
Leonardo
Posts: 260
Joined: Tue Dec 04, 2007 5:09 am
Hardware configuration: GPU slots on home-built, purpose-built PCs.
Location: Eagle River, Alaska

Re: EUEs, multiple, on Project 7500 and 7501 work units

Post by Leonardo »

Are projects 7500 and 7501 the only projects that lock out odd number SMP flats, a la -SMP 7?

I knew that the odd number lock out (even number enforcement) was in place for client V7, but as you can see, it was a surprise that it had reached 6.34.

I may have to scramble for some solutions, as I'm also running multiple GPU clients with each SMP/bigadv machine.
Image
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: EUEs, multiple, on Project 7500 and 7501 work units

Post by Grandpa_01 »

No there are several others but I can not remember which ones they were they were a3's and some of the bigadv can not run -smp 11 or 23

The solution is -smp 6

There have been quit a few post about it you can find them here. search.php?keywords=Instant+EUE
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
Leonardo
Posts: 260
Joined: Tue Dec 04, 2007 5:09 am
Hardware configuration: GPU slots on home-built, purpose-built PCs.
Location: Eagle River, Alaska

Re: EUEs, multiple, on Project 7500 and 7501 work units

Post by Leonardo »

Thanks. Unfortunate about this. Were I to run -SMP 6, I would have one core in each machine at idle.
I'll check out the link.
Image
bruce
Posts: 20822
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: EUEs, multiple, on Project 7500 and 7501 work units

Post by bruce »

Thank you for your report, Leonardo.
Leonardo wrote:Well, that explains it. Hmm, I thought the forced 8-thread matter was limited to V7. Obviously I was wrong.
You're talking about two different things. One is whether FAH enforces certain limitations on your choices and the other is the limitations that are inherent in the current Gromacs code when running certain proteins.

The GROMACS code is and open-source development of a group at gromacs.org. It's specifically designed for researchers who dedicate their own computer to their research. Even though the Pande Group has adapted it to be used in a distributed computing environment (and has made other contributions to that development) it's primary focus is on dedicated machines.

There's no such thing as a dedicated computer with 7 cores so we need to recognize the fact that donors are taking it on themselves to use it in untested ways when the run it as -smp 7 and it's always possible to run into unexpected limitations such as an increased failure rate. Such a configuration has never been recommended by either gromacs.org or by stanford and it may or may not work.

It's a very useful analysis method when using easily factorable numbers that occur in real hardware like 4, 8, 12, 16, 24, 32 etc. Many other values may or may not work, depending on the atomic structure being analyzed. When donors choose to run with less thoroughly tested numbers and make reports like yours Stanford can exclude those problems by limiting donor choices to configurations that are known to work or to ones for which few reports have been collected. The fact that this problem didn't get discovered when 7500 was being beta tested probably just means that the beta testers were running with other values.

Stanford did start imposing limitations in V7 but since the gromacs code is identical in both V6 and V7 it made more sense to put the exclusions in the core rather than in the client(s). If Stanford has not excluded 7 yet, they probably will soon.

The FAH research is still very much cutting-edge technology and they can only fix the problems that they know about.
Leonardo wrote:Thanks. Unfortunate about this. Were I to run -SMP 6, I would have one core in each machine at idle.
That's only true if you choose to run it that way. You can always run a uniprocessor client concurrently with a smp 6 client or run a smp 8 client without whatever you're running in the 8th core -- or you can continue to run with smp 7 (at least until it's excluded) and accept the fact that it's going to be less reliable, at times, even though that would be a poor choice.
Leonardo
Posts: 260
Joined: Tue Dec 04, 2007 5:09 am
Hardware configuration: GPU slots on home-built, purpose-built PCs.
Location: Eagle River, Alaska

Re: EUEs, multiple, on Project 7500 and 7501 work units

Post by Leonardo »

Yes, Bruce. Perhaps I was being cryptic. I understood all that, I was just sloppy in my wording. Also, I just didn't know that the non-prime core count limitations had it had yet come into effect already for some projects processed by 4-core/8-thread processors.
If Stanford has not excluded 7 yet, they probably will soon.
Again, yes, I knew that, but it still took me by surprise. :lol: :shock:

I'll find a solution that complies with best practices. Always have, always will. :D
Image
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: EUEs, multiple, on Project 7500 and 7501 work units

Post by Grandpa_01 »

Actually it was discovered in beta testing by me. :oops: viewtopic.php?f=66&t=18549#p185850
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
bruce
Posts: 20822
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: EUEs, multiple, on Project 7500 and 7501 work units

Post by bruce »

Grandpa_01 wrote:Actually it was discovered in beta testing by me. :oops: viewtopic.php?f=66&t=18549#p185850
True, but I still regard the beta forum as something than not everybody reads. Stanford is aware of the issue but I don't know what their plan is to deal with it. I didn't see an official warning about smp 7 on those projects. I suspect that in their mind, they've already warned donors against using number like smp 7 and it doesn't need to be said again -- in spite of the fact that a lot of people have run smp 7 for a long time and generally have had few difficulties.

When the Mods designate a particular WU to be a "bad WU" and suspend future generations of that trajectory, we do not know how many threads were used when the errors occurred. (e.g. - - Were all of the bad ones 7 and all of the successes 8?) I don't see an easy way to solve this dilemma.
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: EUEs, multiple, on Project 7500 and 7501 work units

Post by Grandpa_01 »

I never have figured out why there never has been an official announcement about running -smp 7. I would say the majority of people that run -smp and GPU at the same time use -smp 7 flag. My guess is they are going to just implement the fix in v7 that does not allow the use of it.
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
Mactin
Posts: 231
Joined: Sun Dec 02, 2007 1:08 pm
Location: Outremont, Montréal, Québec

Re: EUEs, multiple, on Project 7500 and 7501 work units

Post by Mactin »

Project 7500 also instantly EUE's with smp -10.
and '10' is not a prime number as far as I know.
Image
kasson
Pande Group Member
Posts: 1459
Joined: Thu Nov 29, 2007 9:37 pm

Re: EUEs, multiple, on Project 7500 and 7501 work units

Post by kasson »

No, but 10 = 5x2. So you're probably having a 5 issue. 12 and 8 have better factorizations, so you're much safer there.

In our opinion, -smp 7 is generally a risky idea (not recommended). 10 is usually safer, but 6,8,12,16,20,24 are better. (You might wonder about 20 = 5x4, but some of the -smp sensitive projects such as 7500 don't assign that high.)
Post Reply