Page 1 of 2

Projects 10113, 10114 and 10115, immediate EUEs

Posted: Tue Mar 08, 2011 11:43 pm
by Craptor
I've been folding with SMP client (FAH GPU Tracker v2) for a little over a month now, and these "new" project 10113, 10114 and 10115 assignments give me "EARLY_UNIT_END" error messages and WUs fail.

Here's an example from one of the error logs:

Code: Select all

Arguments: -oneunit -forceasm -smp 5 

[20:33:56] Work directory not found. Creating...
[20:33:56] Could not open work queue, generating new queue...
[20:33:56] - Preparing to get new work unit...
[20:33:56] Cleaning up work directory
[20:33:56] + Attempting to get work packet
[20:33:56] Passkey found
[20:33:56] - Connecting to assignment server
[20:34:00] - Successful: assigned to (171.64.65.75).
[20:34:00] + News From Folding@Home: Welcome to Folding@Home
[20:34:00] Loaded queue successfully.
[20:34:09] + Closed connections
[20:34:09] 
[20:34:09] + Processing work unit
[20:34:09] Core required: FahCore_a3.exe
[20:34:09] Core found.
[20:34:09] Working on queue slot 01 [March 8 20:34:09 UTC]
[20:34:09] + Working ...
[20:34:09] 
[20:34:09] *------------------------------*
[20:34:09] Folding@Home Gromacs SMP Core
[20:34:09] Version 2.27 (Dec. 15, 2010)
[20:34:09] 
[20:34:09] Preparing to commence simulation
[20:34:09] - Assembly optimizations manually forced on.
[20:34:09] - Not checking prior termination.
[20:34:09] - Expanded 538173 -> 1267180 (decompressed 235.4 percent)
[20:34:09] Called DecompressByteArray: compressed_data_size=538173 data_size=1267180, decompressed_data_size=1267180 diff=0
[20:34:10] - Digital signature verified
[20:34:10] 
[20:34:10] Project: 10113 (Run 46, Clone 6, Gen 0)
[20:34:10] 
[20:34:10] Assembly optimizations on if available.
[20:34:10] Entering M.D.
[20:34:16] Mapping NT from 5 to 5 
[20:34:16] mdrun returned 255
[20:34:16] Going to send back what have done -- stepsTotalG=4000000
[20:34:16] Work fraction=0.0000 steps=4000000.
[20:34:20] logfile size=0 infoLength=0 edr=0 trr=25
[20:34:20] logfile size: 0 info=0 bed=0 hdr=25
[20:34:20] - Writing 643 bytes of core data to disk...
[20:34:20] Done: 131 -> 148 (compressed to 112.9 percent)
[20:34:20]   ... Done.
[20:34:20] 
[20:34:20] Folding@home Core Shutdown: EARLY_UNIT_END
[20:34:24] CoreStatus = 72 (114)
[20:34:24] Sending work to server
[20:34:24] Project: 10113 (Run 46, Clone 6, Gen 0)


[20:34:24] + Attempting to send results [March 8 20:34:24 UTC]
[20:34:27] + Results successfully sent
[20:34:27] Thank you for your contribution to Folding@Home.
[20:34:31] + -oneunit flag given and have now finished a unit. Exiting.
Folding@Home Client Shutdown.
The other projects were:
- 10113 (Run 28, Clone 9, Gen 0)
- 10113 (Run 30, Clone 9, Gen 0)
- 10113 (Run 33, Clone 9, Gen 0)
- 10114 (Run 73, Clone 8, Gen 0)
- 10114 (Run 75, Clone 8, Gen 0)
- 10114 (Run 76, Clone 8, Gen 0)
- 10115 (Run 65, Clone 7, Gen 0)
- 10115 (Run 66, Clone 7, Gen 0)
- 10115 (Run 67, Clone 7, Gen 0)
- 10115 (Run 69, Clone 7, Gen 0)

I use the latest FAH GPU Tracker V2 (3.52) with latest clients. I've got zero failed WUs before this.

CPU: AMD Phenom II X6 1055T
GPU: XFX Radeon HD 5850
Clock speeds are at stock and temps are not too high.

Tried re-installing the FAH GPU Tracker, but it didn't work. I also tried to change -smp 5 flag to -smp 4, but still no result. What next?

Ask, if you need some additional info and thank you!

(Sorry if this is on the wrong area or there's already a topic from this same thing :oops: )

Re: Projects 10113, 10114 and 10115, immediate EUEs

Posted: Wed Mar 09, 2011 12:54 am
by bruce
I'm not sure, but this may be because you're using an odd number of SMP cores. SMP has some problems with odd numbers of cores on certain WUs.

Re: Projects 10113, 10114 and 10115, immediate EUEs

Posted: Wed Mar 09, 2011 12:58 am
by bruce
I'm not sure, but this may be because you're using an odd number of SMP cores. SMP has some problems with odd numbers of cores on certain WUs. See viewtopic.php?f=58&t=17835 although they're talking about -smp 7, not -smp 5.

Re: Projects 10113, 10114 and 10115, immediate EUEs

Posted: Wed Mar 09, 2011 1:33 am
by Rick
I'm having a similar issue with WU 10113 (and others). Log shows:

< . . . >
[03:05:21] *------------------------------*
[03:05:21] Folding@Home Gromacs SMP Core
[03:05:21] Version 2.27 (Dec. 15, 2010)
[03:05:21]
[03:05:21] Preparing to commence simulation
[03:05:21] - Looking at optimizations...
[03:05:21] - Created dyn
[03:05:21] - Files status OK
[03:05:21] - Expanded 530958 -> 1255672 (decompressed 236.4 percent)
[03:05:21] Called DecompressByteArray: compressed_data_size=530958 data_size=125
5672, decompressed_data_size=1255672 diff=0
[03:05:21] - Digital signature verified
[03:05:21]
[03:05:21] Project: 10113 (Run 9, Clone 7, Gen 0)
[03:05:21]
[03:05:21] Assembly optimizations on if available.
[03:05:21] Entering M.D.
[03:05:27] Mapping NT from 6 to 6
[03:05:27] Completed 0 out of 4000000 steps (0%)
[03:20:27] mdrun returned 255
[03:20:27] Going to send back what have done -- stepsTotalG=4000000
[03:20:27] Work fraction=0.0082 steps=4000000.
[03:20:31] logfile size=13364 infoLength=13364 edr=0 trr=25
[03:20:31] logfile size: 13364 info=13364 bed=0 hdr=25
[03:20:31] - Writing 13902 bytes of core data to disk...
[03:20:31] Done: 13390 -> 4703 (compressed to 35.1 percent)
[03:20:31] ... Done.
[03:20:31]
[03:20:31] Folding@home Core Shutdown: UNSTABLE_MACHINE
[03:20:35] CoreStatus = 7A (122)
[03:20:35] Sending work to server
[03:20:35] Project: 10113 (Run 9, Clone 7, Gen 0)
< . . . >

After several failures like this it gives:
< . . . >
[03:51:37] + Attempting to send results [March 8 03:51:37 UTC]
[03:51:38] + Results successfully sent
[03:51:38] Thank you for your contribution to Folding@Home.
[03:51:42] EUE limit exceeded. Pausing 24 hours.
< . . . >

I also get simliar errors with WUs:
[03:20:56] Project: 6057 (Run 0, Clone 85, Gen 309)
[02:49:53] Project: 6058 (Run 0, Clone 142, Gen 196)
[02:34:19] Project: 10114 (Run 86, Clone 6, Gen 0)
[02:18:49] Project: 6068 (Run 1, Clone 192, Gen 124)

I am using the SMP client with 6 cores on an AMD 1090 processor with 4 GB RAM under Windows7-64bit. I've only had problems with one other WU and it had been flagged as "bad" already.

Suggestions?

Re: Projects 10113, 10114 and 10115, immediate EUEs

Posted: Wed Mar 09, 2011 2:37 am
by PantherX
Welcome to the F@H Forum Rick,

You mean to say that you get this error when using the -smp flag? Are you using -smp X flag? If so, what is the vaule of X? If you're unsure of the value, please post your FAHlog's first section as shown here (viewtopic.php?p=160884#p160884).

Re: Projects 10113, 10114 and 10115, immediate EUEs

Posted: Wed Mar 09, 2011 6:58 am
by HendricksSA
Craptor, this is probably the same problem with prime numbers of threads. See the topic Bruce mentioned above.

Rick, your problem with the 605x and 606x may be different. Suggest you follow PantherX suggestion to post logs but would recommend a new topic for them. Looking forward to your logs.

Re: Projects 10113, 10114 and 10115, immediate EUEs

Posted: Wed Mar 09, 2011 10:39 pm
by Craptor
Well, I tried again with -smp 4 and now it works, so the problem was clearly with prime numbers of threads.

Thank you, bruce and HendricsSA.

Re: Projects 10113, 10114 and 10115, immediate EUEs

Posted: Sun Mar 13, 2011 7:43 pm
by Nathan_P
I've just had a couple as well. Logs provided for troubleshooting

Code: Select all

# Windows SMP Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.34

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\folding@home\F@H SMP
Executable: C:\Folding@home\F@H SMP\Fah6.exe
Arguments: -smp -smp 6 -verbosity 9 

[18:41:05] - Ask before connecting: No
[18:41:05] - User name: Nathan_P (Team 56895)
[18:41:05] - User ID: 40CC439C6C440CD1
[18:41:05] - Machine ID: 1
[18:41:05] 
[18:41:05] Loaded queue successfully.
[18:41:05] 
[18:41:05] - Autosending finished units... [March 13 18:41:05 UTC]
[18:41:05] + Processing work unit
[18:41:05] Trying to send all finished work units
[18:41:05] Core required: FahCore_a3.exe
[18:41:05] + No unsent completed units remaining.
[18:41:05] - Autosend completed
[18:41:05] Core found.
[18:41:05] Working on queue slot 04 [March 13 18:41:05 UTC]
[18:41:05] + Working ...
[18:41:05] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 04 -np 6 -checkpoint 15 -verbose -lifeline 3996 -version 634'

[18:41:05] 
[18:41:05] *------------------------------*
[18:41:05] Folding@Home Gromacs SMP Core
[18:41:05] Version 2.27 (Dec. 15, 2010)
[18:41:05] 
[18:41:05] Preparing to commence simulation
[18:41:05] - Ensuring status. Please wait.
[18:41:15] - Looking at optimizations...
[18:41:15] - Working with standard loops on this execution.
[18:41:15] - Previous termination of core was improper.
[18:41:15] - Going to use standard loops.
[18:41:15] - Files status OK
[18:41:15] - Expanded 984531 -> 2040300 (decompressed 207.2 percent)
[18:41:15] Called DecompressByteArray: compressed_data_size=984531 data_size=2040300, decompressed_data_size=2040300 diff=0
[18:41:15] - Digital signature verified
[18:41:15] 
[18:41:15] Project: 10115 (Run 77, Clone 2, Gen 4)
[18:41:15] 
[18:41:15] Entering M.D.
[18:41:21] Using Gromacs checkpoints
[18:41:21] Mapping NT from 6 to 6 
[18:41:22] Resuming from checkpoint
[18:41:22] Verified work/wudata_04.log
[18:41:22] Verified work/wudata_04.trr
[18:41:22] Verified work/wudata_04.xtc
[18:41:23] Verified work/wudata_04.edr
[18:41:23] Completed 1319148 out of 2000000 steps  (65%)
[18:42:16] Completed 1320000 out of 2000000 steps  (66%)
[19:08:22] Completed 1340000 out of 2000000 steps  (67%)
[19:31:28] Completed 1360000 out of 2000000 steps  (68%)
[19:37:19] Gromacs cannot continue further.
[19:37:19] Going to send back what have done -- stepsTotalG=2000000
[19:37:19] Work fraction=0.6833 steps=2000000.
[19:37:23] logfile size=18174 infoLength=18174 edr=0 trr=23
[19:37:23] logfile size: 18174 info=18174 bed=0 hdr=23
[19:37:23] - Writing 18710 bytes of core data to disk...
[19:37:23] Done: 18198 -> 5313 (compressed to 29.  ... Done.
[19:37:23] 
[19:37:23] Folding@home Core Shutdown: EARLY_UNIT_END
[19:37:43] CoreStatus = 72 (114)
[19:37:43] Sending work to server
[19:37:43] Project: 10115 (Run 77, Clone 2, Gen 4)


[19:37:43] + Attempting to send results [March 13 19:37:43 UTC]
[19:37:43] - Reading file work/wuresults_04.dat from core
[19:37:43]   (Read 5825 bytes from disk)
[19:37:43] Connecting to http://171.64.65.75:8080/
[19:37:44] Posted data.
[19:37:44] Initial: 0000; - Uploaded at ~6 kB/s
[19:37:44] - Averaged speed for that direction ~30 kB/s
[19:37:44] + Results successfully sent
[19:37:44] Thank you for your contribution to Folding@Home.
[19:37:48] Trying to send all finished work units
[19:37:48] + No unsent completed units remaining.
[19:37:48] - Preparing to get new work unit...
[19:37:48] Cleaning up work directory
[19:37:48] + Attempting to get work packet
[19:37:48] Passkey found
[19:37:48] - Will indicate memory of 2047 MB
[19:37:48] - Detect CPU. Vendor: AuthenticAMD, Family: 15, Model: 10, Stepping: 0
[19:37:48] - Connecting to assignment server
[19:37:48] Connecting to http://assign.stanford.edu:8080/
[19:37:49] Posted data.
[19:37:49] Initial: 40AB; - Successful: assigned to (171.64.65.75).
[19:37:49] + News From Folding@Home: Welcome to Folding@Home
[19:37:50] Loaded queue successfully.
[19:37:50] Sent data
[19:37:50] Connecting to http://171.64.65.75:8080/
[19:37:51] Posted data.
[19:37:51] Initial: 0000; - Receiving payload (expected size: 984223)
[19:37:56] - Downloaded at ~192 kB/s
[19:37:56] - Averaged speed for that direction ~173 kB/s
[19:37:56] + Received work.
[19:37:56] Trying to send all finished work units
[19:37:56] + No unsent completed units remaining.
[19:37:56] + Closed connections
[19:38:01] 
[19:38:05] + Processing work unit
[19:38:05] Core required: FahCore_a3.exe
[19:38:05] Core found.
[19:38:05] Working on queue slot 05 [March 13 19:38:05 UTC]
[19:38:05] + Working ...
[19:38:05] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 05 -np 6 -checkpoint 15 -verbose -lifeline 3996 -version 634'

[19:38:05] 
[19:38:05] *------------------------------*
[19:38:05] Folding@Home Gromacs SMP Core
[19:38:05] Version 2.Preparing to commence simulation
[19:38:05] - Looking at optimizations...
[19:38:05] - Created dyn
[19:38:05] - Files status OK
[19:38:06] - Expanded 983711 -> 2040300 (decompressed 207.4 percent)
[19:38:06] Called DecompressByteArray: compressed_data_size=983711 data_size=2040300, decompressed_data_size=2040300 diff=0
[19:38:06] - Digital signature verified
[19:38:06] 
[19:38:06] Project: 10115 (Run 39, Clone 7, Gen 3)
[19:38:06] 
[19:38:06] Assembly optimizations on if available.
[19:38:06] Entering M.D.
[19:38:12] Mapping NT from 6 to 6 
[19:38:15] mdrun returned 255
[19:38:15] Going to send back what have done -- stepsTotalG=2000000
[19:38:15] Work fraction=0.0000 steps=2000000.
[19:38:16] logfile size=6819 infoLength=6819 edr=0 trr=25
[19:38:16] logfile size: 6819 info=6819 bed=0 hdr=25
[19:38:16] - Writing 7357 bytes of core data to disk...
[19:38:16] Done: 6845 -> 2402 (compressed to 35.0 percent)
[19:38:16]   ... Done.
[19:38:16] 
[19:38:16] Folding@home Core Shutdown: EARLY_UNIT_END
[19:38:19] CoreStatus = 72 (114)
[19:38:22] Sending work to server
[19:38:22] Project: 10115 (Run 39, Clone 7, Gen 3)


[19:38:22] + Attempting to send results [March 13 19:38:22 UTC]
[19:38:22] - Reading file work/wuresults_05.dat from core
[19:38:22]   (Read 2914 bytes from disk)
[19:38:22] Connecting to http://171.64.65.75:8080/
[19:38:22] Posted data.
[19:38:22] Initial: 0000; Conversation time very short, giving reduced weight in bandwidth avg
[19:38:22] - Uploaded at ~7 kB/s
[19:38:22] - Averaged speed for that direction ~28 kB/s
[19:38:22] + Results successfully sent
[19:38:22] Thank you for your contribution to Folding@Home.
[19:38:27] Trying to send all finished work units
[19:38:27] + No unsent completed units remaining.
[19:38:27] - Preparing to get new work unit...
[19:38:27] Cleaning up work directory
[19:38:27] + Attempting to get work packet
[19:38:27] Passkey found
[19:38:27] - Will indicate memory of 2047 MB
[19:38:27] - Connecting to assignment server
[19:38:27] Connecting to http://assign.stanford.edu:8080/
[19:38:27] Posted data.
[19:38:27] Initial: 40AB; - Successful: assigned to (171.64.65.75).
[19:38:27] + News From Folding@Home: Welcome to Folding@Home
[19:38:28] Loaded queue successfully.
[19:38:28] Sent data
[19:38:28] Connecting to http://171.64.65.75:8080/
[19:38:29] Posted data.
[19:38:29] Initial: 0000; - Receiving payload (expected size: 987657)
[19:38:29] Killing all core threads
[19:38:29] Could not get process id information.  Please kill core process manually

Folding@Home Client Shutdown at user request.
[19:38:29] ***** Got a SIGTERM signal (2)
[19:38:29] Killing all core threads
[19:38:29] Could not get process id information.  Please kill core process manually

Folding@Home Client Shutdown.
Hardware as follows: AMD 1090T @3.5, M4N82 MB, 2gb XMS2 @800(stock), Asus GTS250, 2 x WD 320gb HDD, Corsair TX950 PSU and plenty of cooling for all components

Re: Projects 10113, 10114 and 10115, immediate EUEs

Posted: Sun Mar 13, 2011 9:14 pm
by PantherX
Nathan_P -> Could it be possible that haveing -smp and -smp 6 flags is causing an issue? Have you tried -smp 5 or -smp 4 or -smp 3 or -smp 2 on the rig?

Re: Projects 10113, 10114 and 10115, immediate EUEs

Posted: Mon Mar 14, 2011 8:54 pm
by Nathan_P
PantherX wrote:Nathan_P -> Could it be possible that haveing -smp and -smp 6 flags is causing an issue? Have you tried -smp 5 or -smp 4 or -smp 3 or -smp 2 on the rig?
Well spotted but if thats the case then it must be specific to these WU as it has folded everything else just fine. Its back up and running now with a 6041 so i will see how it goes, i've put it back to stock clocks (3.2) for now and i will reapply the overclock thursday. I'll also remove the double -smp switch. T
he error code indicated a memory problem but that has never been overclocked.

Re: Projects 10113, 10114 and 10115, immediate EUEs

Posted: Mon Mar 28, 2011 4:15 pm
by LiLChris
Yea I am having the same problem with -smp 7, when I switch to -smp 8 it will fold without any issues.

Code: Select all

[02:07:37] + Processing work unit
[02:07:37] Core required: FahCore_a3.exe
[02:07:37] Core found.
[02:07:37] Working on queue slot 02 [March 27 02:07:37 UTC]
[02:07:37] + Working ...
[02:07:37] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 02 -np 7 -checkpoint 3 -verbose -lifeline 3360 -version 630'

[02:07:37] 
[02:07:37] *------------------------------*
[02:07:37] Folding@Home Gromacs SMP Core
[02:07:37] Version 2.27 (Dec. 15, 2010)
[02:07:37] 
[02:07:37] Preparing to commence simulation
[02:07:37] - Looking at optimizations...
[02:07:37] - Created dyn
[02:07:37] - Files status OK
[02:07:37] - Expanded 984166 -> 2040300 (decompressed 207.3 percent)
[02:07:37] Called DecompressByteArray: compressed_data_size=984166 data_size=2040300, decompressed_data_size=2040300 diff=0
[02:07:37] - Digital signature verified
[02:07:37] 
[02:07:37] Project: 10115 (Run 13, Clone 5, Gen 8)
[02:07:37] 
[02:07:37] Assembly optimizations on if available.
[02:07:37] Entering M.D.
[02:07:43] Mapping NT from 7 to 7 
[02:07:43] mdrun returned 255
[02:07:43] Going to send back what have done -- stepsTotalG=2000000
[02:07:43] Work fraction=0.0000 steps=2000000.
[02:07:47] logfile size=0 infoLength=0 edr=0 trr=25
[02:07:47] logfile size: 0 info=0 bed=0 hdr=25
[02:07:47] - Writing 643 bytes of core data to disk...
[02:07:47] Done: 131 -> 151 (compressed to 115.2 percent)
[02:07:47]   ... Done.
[02:07:47] 
[02:07:47] Folding@home Core Shutdown: EARLY_UNIT_END
[02:07:51] CoreStatus = 72 (114)
[02:07:51] Sending work to server
[02:07:51] Project: 10115 (Run 13, Clone 5, Gen 8)

Re: Projects 10113, 10114 and 10115, immediate EUEs

Posted: Mon Mar 28, 2011 4:27 pm
by 7im
The recommendation not to use prime values for -smp X seems like a good one. ;)

Re: Projects 10113, 10114 and 10115, immediate EUEs

Posted: Mon Mar 28, 2011 4:34 pm
by LiLChris
7im wrote:The recommendation not to use prime values for -smp X seems like a good one. ;)
Yea I got that, but rather they have a fix for this so I can keep using -smp 7 instead of dropping down to -smp 6. :(

Re: Projects 10113, 10114 and 10115, immediate EUEs

Posted: Mon Mar 28, 2011 4:45 pm
by Grandpa_01
OK what did I miss I have not seen a fix for this issue. Do you have a link to the fix LiLChris. ?

Re: Projects 10113, 10114 and 10115, immediate EUEs

Posted: Mon Mar 28, 2011 4:49 pm
by LiLChris
Grandpa_01 wrote:OK what did I miss I have not seen a fix for this issue. Do you have a link to the fix LiLChris. ?
No, that is why I am here hoping a fix can be made. ;)

Maybe even a warning so more people would know, since its probably not noticeable unless you check a work history log like the one HFM has.