P11020 immediate EUE

Moderators: Site Moderators, FAHC Science Team

Post Reply
vladh4x0r
Posts: 5
Joined: Tue Jul 28, 2009 5:04 am
Hardware configuration: 1) Core i7 860 @ 3.5 GHz, 6GB DDR3
GPUs: Radeon 4850 and 4830 (not folding)
OS: Windows 7 64-bit
SMP2 client

2) QX9650 @ 3.0 GHz, 4GB DDR2
GPU: GT240
OS: Vista 64-bit
SMP2 client
GPU2 client
Location: Folsom, CA, USA

P11020 immediate EUE

Post by vladh4x0r »

Started getting 11020 assignments last night, every one of them EUEd within a few seconds of starting:

Code: Select all

[02:53:04] + Attempting to get work packet
[02:53:04] Passkey found
[02:53:04] - Will indicate memory of 4087 MB
[02:53:04] - Connecting to assignment server
[02:53:04] Connecting to http://assign.stanford.edu:8080/
[02:53:04] Posted data.
[02:53:04] Initial: 40AB; - Successful: assigned to (171.64.65.55).
[02:53:04] + News From Folding@Home: Welcome to Folding@Home
[02:53:05] Loaded queue successfully.
[02:53:05] Sent data
[02:53:05] Connecting to http://171.64.65.55:8080/
[02:53:05] Posted data.
[02:53:05] Initial: 0000; - Receiving payload (expected size: 659772)
[02:53:06] - Downloaded at ~644 kB/s
[02:53:06] - Averaged speed for that direction ~723 kB/s
[02:53:06] + Received work.
[02:53:06] Trying to send all finished work units
[02:53:06] + No unsent completed units remaining.
[02:53:06] + Closed connections
[02:53:11] 
[02:53:11] + Processing work unit
[02:53:11] Core required: FahCore_a3.exe
[02:53:11] Core found.
[02:53:11] Working on queue slot 01 [March 5 02:53:11 UTC]
[02:53:11] + Working ...
[02:53:11] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 01 -np 7 -checkpoint 15 -verbose -lifeline 380 -version 634'

[02:53:11] 
[02:53:11] *------------------------------*
[02:53:11] Folding@Home Gromacs SMP Core
[02:53:11] Version 2.27 (Dec. 15, 2010)
[02:53:11] 
[02:53:11] Preparing to commence simulation
[02:53:11] - Looking at optimizations...
[02:53:11] - Created dyn
[02:53:11] - Files status OK
[02:53:11] - Expanded 659260 -> 1092080 (decompressed 165.6 percent)
[02:53:11] Called DecompressByteArray: compressed_data_size=659260 data_size=1092080, decompressed_data_size=1092080 diff=0
[02:53:11] - Digital signature verified
[02:53:11] 
[02:53:11] Project: 11020 (Run 0, Clone 85, Gen 1)
[02:53:11] 
[02:53:11] Assembly optimizations on if available.
[02:53:11] Entering M.D.
[02:53:17] Mapping NT from 7 to 7 
[02:53:17] mdrun returned 255
[02:53:17] Going to send back what have done -- stepsTotalG=1000000
[02:53:17] Work fraction=0.0000 steps=1000000.
[02:53:21] logfile size=0 infoLength=0 edr=0 trr=25
[02:53:21] logfile size: 0 info=0 bed=0 hdr=25
[02:53:21] - Writing 643 bytes of core data to disk...
[02:53:21] Done: 131 -> 151 (compressed to 115.2 percent)
[02:53:21]   ... Done.
[02:53:21] 
[02:53:21] Folding@home Core Shutdown: EARLY_UNIT_END
[02:53:25] CoreStatus = 72 (114)
[02:53:25] Sending work to server
[02:53:25] Project: 11020 (Run 0, Clone 85, Gen 1)


[02:53:25] + Attempting to send results [March 5 02:53:25 UTC]
[02:53:25] - Reading file work/wuresults_01.dat from core
[02:53:25]   (Read 663 bytes from disk)
[02:53:25] Connecting to http://171.64.65.55:8080/
[02:53:25] Posted data.
[02:53:25] Initial: 0000; Conversation time very short, giving reduced weight in bandwidth avg
[02:53:25] - Uploaded at ~3 kB/s
[02:53:25] - Averaged speed for that direction ~6 kB/s
[02:53:25] + Results successfully sent
[02:53:25] Thank you for your contribution to Folding@Home.
[02:53:29] Trying to send all finished work units
[02:53:29] + No unsent completed units remaining.
[02:53:29] - Preparing to get new work unit...
[02:53:29] Cleaning up work directory
Eventually got a 6023 and that ran OK with the same core:

Code: Select all

[02:53:29] + Attempting to get work packet
[02:53:29] Passkey found
[02:53:29] - Will indicate memory of 4087 MB
[02:53:29] - Connecting to assignment server
[02:53:29] Connecting to http://assign.stanford.edu:8080/
[02:53:29] Posted data.
[02:53:29] Initial: 40AB; - Successful: assigned to (171.64.65.54).
[02:53:29] + News From Folding@Home: Welcome to Folding@Home
[02:53:30] Loaded queue successfully.
[02:53:30] Sent data
[02:53:30] Connecting to http://171.64.65.54:8080/
[02:53:30] Posted data.
[02:53:30] Initial: 0000; - Receiving payload (expected size: 1767336)
[02:53:32] - Downloaded at ~862 kB/s
[02:53:32] - Averaged speed for that direction ~751 kB/s
[02:53:32] + Received work.
[02:53:32] Trying to send all finished work units
[02:53:32] + No unsent completed units remaining.
[02:53:32] + Closed connections
[02:53:37] 
[02:53:37] + Processing work unit
[02:53:37] Core required: FahCore_a3.exe
[02:53:37] Core found.
[02:53:37] Working on queue slot 02 [March 5 02:53:37 UTC]
[02:53:37] + Working ...
[02:53:37] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 02 -np 7 -checkpoint 15 -verbose -lifeline 380 -version 634'

[02:53:37] 
[02:53:37] *------------------------------*
[02:53:37] Folding@Home Gromacs SMP Core
[02:53:37] Version 2.27 (Dec. 15, 2010)
[02:53:37] 
[02:53:37] Preparing to commence simulation
[02:53:37] - Looking at optimizations...
[02:53:37] - Created dyn
[02:53:37] - Files status OK
[02:53:37] - Expanded 1766824 -> 1967109 (decompressed 111.3 percent)
[02:53:37] Called DecompressByteArray: compressed_data_size=1766824 data_size=1967109, decompressed_data_size=1967109 diff=0
[02:53:37] - Digital signature verified
[02:53:37] 
[02:53:37] Project: 6023 (Run 0, Clone 6, Gen 479)
[02:53:37] 
[02:53:37] Assembly optimizations on if available.
[02:53:37] Entering M.D.
[02:53:43] Mapping NT from 7 to 7 
[02:53:43] Completed 0 out of 500000 steps  (0%)
[02:57:47] Completed 5000 out of 500000 steps  (1%)
[03:01:51] Completed 10000 out of 500000 steps  (2%)
[03:05:55] Completed 15000 out of 500000 steps  (3%)
Log file shows that it "ate" a few dozen units:

Code: Select all

[19:44:54] Project: 11020 (Run 0, Clone 27, Gen 0)
[19:45:04] Folding@home Core Shutdown: EARLY_UNIT_END
[19:45:08] Project: 11020 (Run 0, Clone 27, Gen 0)
[19:45:18] Project: 11020 (Run 0, Clone 49, Gen 0)
[19:45:29] Folding@home Core Shutdown: EARLY_UNIT_END
[19:45:33] Project: 11020 (Run 0, Clone 49, Gen 0)
[02:41:39] Project: 11020 (Run 0, Clone 154, Gen 0)
[02:41:49] Folding@home Core Shutdown: EARLY_UNIT_END
[02:41:53] Project: 11020 (Run 0, Clone 154, Gen 0)
[02:42:04] Project: 11020 (Run 0, Clone 155, Gen 0)
[02:42:14] Folding@home Core Shutdown: EARLY_UNIT_END
[02:42:18] Project: 11020 (Run 0, Clone 155, Gen 0)
[02:42:29] Project: 11020 (Run 0, Clone 151, Gen 0)
[02:42:39] Folding@home Core Shutdown: EARLY_UNIT_END
[02:42:43] Project: 11020 (Run 0, Clone 151, Gen 0)
[02:42:53] Project: 11020 (Run 0, Clone 152, Gen 0)
[02:43:04] Folding@home Core Shutdown: EARLY_UNIT_END
[02:43:07] Project: 11020 (Run 0, Clone 152, Gen 0)
[02:43:18] Project: 11020 (Run 0, Clone 157, Gen 0)
[02:43:29] Folding@home Core Shutdown: EARLY_UNIT_END
[02:43:32] Project: 11020 (Run 0, Clone 157, Gen 0)
[02:43:43] Project: 11020 (Run 0, Clone 158, Gen 0)
[02:43:53] Folding@home Core Shutdown: EARLY_UNIT_END
[02:43:57] Project: 11020 (Run 0, Clone 158, Gen 0)
[02:44:09] Project: 11020 (Run 0, Clone 156, Gen 0)
[02:44:19] Folding@home Core Shutdown: EARLY_UNIT_END
[02:44:23] Project: 11020 (Run 0, Clone 156, Gen 0)
[02:44:34] Project: 11020 (Run 0, Clone 146, Gen 0)
[02:44:44] Folding@home Core Shutdown: EARLY_UNIT_END
[02:44:48] Project: 11020 (Run 0, Clone 146, Gen 0)
[02:44:58] Project: 11020 (Run 0, Clone 147, Gen 0)
[02:45:09] Folding@home Core Shutdown: EARLY_UNIT_END
[02:45:12] Project: 11020 (Run 0, Clone 147, Gen 0)
[02:45:23] Project: 11020 (Run 0, Clone 144, Gen 0)
[02:45:33] Folding@home Core Shutdown: EARLY_UNIT_END
[02:45:37] Project: 11020 (Run 0, Clone 144, Gen 0)
[02:45:47] Project: 11020 (Run 0, Clone 145, Gen 0)
[02:45:58] Folding@home Core Shutdown: EARLY_UNIT_END
[02:46:01] Project: 11020 (Run 0, Clone 145, Gen 0)
[02:46:12] Project: 11020 (Run 0, Clone 159, Gen 0)
[02:46:22] Folding@home Core Shutdown: EARLY_UNIT_END
[02:46:26] Project: 11020 (Run 0, Clone 159, Gen 0)
[02:46:36] Project: 11020 (Run 0, Clone 143, Gen 0)
[02:46:47] Folding@home Core Shutdown: EARLY_UNIT_END
[02:46:51] Project: 11020 (Run 0, Clone 143, Gen 0)
[02:47:01] Project: 11020 (Run 0, Clone 160, Gen 0)
[02:47:12] Folding@home Core Shutdown: EARLY_UNIT_END
[02:47:15] Project: 11020 (Run 0, Clone 160, Gen 0)
[02:47:26] Project: 11020 (Run 0, Clone 352, Gen 1)
[02:47:36] Folding@home Core Shutdown: EARLY_UNIT_END
[02:47:40] Project: 11020 (Run 0, Clone 352, Gen 1)
[02:47:51] Project: 11020 (Run 0, Clone 161, Gen 0)
[02:48:01] Folding@home Core Shutdown: EARLY_UNIT_END
[02:48:05] Project: 11020 (Run 0, Clone 161, Gen 0)
[02:48:15] Project: 11020 (Run 0, Clone 162, Gen 0)
[02:48:26] Folding@home Core Shutdown: EARLY_UNIT_END
[02:48:29] Project: 11020 (Run 0, Clone 162, Gen 0)
[02:48:40] Project: 11020 (Run 0, Clone 163, Gen 0)
[02:48:50] Folding@home Core Shutdown: EARLY_UNIT_END
[02:48:54] Project: 11020 (Run 0, Clone 163, Gen 0)
[02:49:04] Project: 11020 (Run 0, Clone 407, Gen 1)
[02:49:15] Folding@home Core Shutdown: EARLY_UNIT_END
[02:49:19] Project: 11020 (Run 0, Clone 407, Gen 1)
[02:49:29] Project: 11020 (Run 0, Clone 164, Gen 0)
[02:49:40] Folding@home Core Shutdown: EARLY_UNIT_END
[02:49:43] Project: 11020 (Run 0, Clone 164, Gen 0)
[02:49:54] Project: 11020 (Run 0, Clone 165, Gen 0)
[02:50:04] Folding@home Core Shutdown: EARLY_UNIT_END
[02:50:08] Project: 11020 (Run 0, Clone 165, Gen 0)
[02:50:18] Project: 11020 (Run 0, Clone 167, Gen 0)
[02:50:29] Folding@home Core Shutdown: EARLY_UNIT_END
[02:50:32] Project: 11020 (Run 0, Clone 167, Gen 0)
[02:50:43] Project: 11020 (Run 0, Clone 166, Gen 0)
[02:50:53] Folding@home Core Shutdown: EARLY_UNIT_END
[02:50:57] Project: 11020 (Run 0, Clone 166, Gen 0)
[02:51:07] Project: 11020 (Run 0, Clone 168, Gen 0)
[02:51:18] Folding@home Core Shutdown: EARLY_UNIT_END
[02:51:21] Project: 11020 (Run 0, Clone 168, Gen 0)
[02:51:32] Project: 11020 (Run 0, Clone 170, Gen 0)
[02:51:42] Folding@home Core Shutdown: EARLY_UNIT_END
[02:51:46] Project: 11020 (Run 0, Clone 170, Gen 0)
[02:51:56] Project: 11020 (Run 0, Clone 171, Gen 0)
[02:52:07] Folding@home Core Shutdown: EARLY_UNIT_END
[02:52:11] Project: 11020 (Run 0, Clone 171, Gen 0)
[02:52:21] Project: 11020 (Run 0, Clone 172, Gen 1)
[02:52:32] Folding@home Core Shutdown: EARLY_UNIT_END
[02:52:35] Project: 11020 (Run 0, Clone 172, Gen 1)
[02:52:46] Project: 11020 (Run 0, Clone 173, Gen 0)
[02:52:56] Folding@home Core Shutdown: EARLY_UNIT_END
[02:53:00] Project: 11020 (Run 0, Clone 173, Gen 0)
[02:53:11] Project: 11020 (Run 0, Clone 85, Gen 1)
[02:53:21] Folding@home Core Shutdown: EARLY_UNIT_END
[02:53:25] Project: 11020 (Run 0, Clone 85, Gen 1)
No good ideas on where to start troubleshooting. This box has been folding everything else with no problem for over a year (i7 860), just finished a bigadv WU.

Question for the experts: does the "number of threads" (-smp n) affect different WUs in different ways? I've ran quite a few many WUs over the last few weeks with "smp -7" - could P11020 not "like" this? I don't have a unit to test with of course, since they got self-deleted.

Edit: Found this thread, but apparently "-smp 7" worked well for everyone, at least back then:

viewtopic.php?f=58&t=14423&start=75#p165234
Image
Slash_2CPU
Posts: 57
Joined: Sat Apr 19, 2008 5:15 pm

Re: P11020 immediate EUE

Post by Slash_2CPU »

Update your client. I think that may be a new a5 core unit, and your client needs an update to run that new core.
ASRock X99 WS i7-5930K @ 4.4GHz /2x GTX 970 @ 1.46GHz /4x4GB DDR4-2666
Phenom II X6 @ 3.7GHz /2x2GB DDR3-1680 /GTX 970 @ 1.40GHz
450-600K PPD @ ~850W
vladh4x0r
Posts: 5
Joined: Tue Jul 28, 2009 5:04 am
Hardware configuration: 1) Core i7 860 @ 3.5 GHz, 6GB DDR3
GPUs: Radeon 4850 and 4830 (not folding)
OS: Windows 7 64-bit
SMP2 client

2) QX9650 @ 3.0 GHz, 4GB DDR2
GPU: GT240
OS: Vista 64-bit
SMP2 client
GPU2 client
Location: Folsom, CA, USA

Re: P11020 immediate EUE

Post by vladh4x0r »

Thanks Slash - I'm already running the 6.34 client, and finished one bigadv unit with it using the new A5 core. When P11020 first got assigned, it downloaded the new 2.27 A3 core, which is successfully running P6023 now.
Image
Jeannie
Posts: 49
Joined: Sun Dec 02, 2007 3:07 am
Location: Central New Jersey

Re: P11020 immediate EUE

Post by Jeannie »

You're running with -smp 7. I had the same problem with this project 11020. It can't handle -smp 7 - you have to use -smp 8 or -smp 6.
vladh4x0r
Posts: 5
Joined: Tue Jul 28, 2009 5:04 am
Hardware configuration: 1) Core i7 860 @ 3.5 GHz, 6GB DDR3
GPUs: Radeon 4850 and 4830 (not folding)
OS: Windows 7 64-bit
SMP2 client

2) QX9650 @ 3.0 GHz, 4GB DDR2
GPU: GT240
OS: Vista 64-bit
SMP2 client
GPU2 client
Location: Folsom, CA, USA

Re: P11020 immediate EUE

Post by vladh4x0r »

Thanks Jeannie - looks like I'll be optimizing this box for bigadv with -smp 7 then. I run two GPU clients on dual GTX 460 on it as well, so -smp 8 is much slower, and -smp 6 would likely lose another round of performance.
Image
Arnette
Posts: 25
Joined: Wed Jan 27, 2010 1:30 pm
Location: Ontario, Canada
Contact:

Re: P11020 immediate EUE

Post by Arnette »

Yeah i am having the exact same problem. We shouldn't have to change from smp -7 for this to work....

Does anyone from stanford have input on this?
Our Folding@Home Teampage --> http://www.lbsfolding.info
dvanatta
Pande Group Member
Posts: 62
Joined: Tue Sep 14, 2010 7:00 pm

Re: P11020 immediate EUE

Post by dvanatta »

Hi,

You shouldn't be getting these if you have the -smp 7, it's a bug. We're looking into it. Temporarily we've disabled using more than 6 for this project, but we'll re-enable 8 once we figure out what's going on.

-Dan
toTOW
Site Moderator
Posts: 6349
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: P11020 immediate EUE

Post by toTOW »

Dan> for assignments, the client doesn't take the -smp X into account. It uses the number of detected cores (which is printed at client startup).

So if 8 cores has been detected, it will report 8 cores to the AS, whether it's started with -smp or with another value specified ...
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
dvanatta
Pande Group Member
Posts: 62
Joined: Tue Sep 14, 2010 7:00 pm

Re: P11020 immediate EUE

Post by dvanatta »

toTOW,

Interesting. If that's the case, I don't think there's anything I can do but restrict this to 6 cores. I've also contacted the people that work on the server code directly, so hopefully this will get resolved at some point.

-Dan
Post Reply