Page 2 of 4
Re: 6900 crashed and dumped (run:11 clone:22 gen:76 core:0xa
Posted: Sat May 26, 2012 1:22 pm
by Joe_H
From previous reports, a WU with a 512 B download is probably bad. They have been having issues with that work server and have been trying to determine the causes and clear them up for a few months.
Re: 6900 crashed and dumped (run:11 clone:22 gen:76 core:0xa
Posted: Sat May 26, 2012 6:46 pm
by bruce
That project does not appear on psummary so I'm surprised you're getting an assignment. I did look up gen:75 of that same trajectory and it was completed in November, also making it difficult to comprehend why the server assigned it to you.
The trajectory project:6900 run:42 clone:15 (gen: *) is still processing fine. [It's not uncommon for an occasional trajectory to end while most continue as long as necessary for the desired scientific results but then it should stop assigning.]
See also recent reports on projects 690x in this forum.
Re: 6900 crashed and dumped (run:11 clone:22 gen:76 core:0xa
Posted: Sun May 27, 2012 5:58 pm
by Joe_H
There have been other posts on Project 6900 still putting out WU's while not showing up on the psummary pages. Apparently there are very few Project 6900 WU's left to assign and they are rarely present on the work server when it is polled to create the project summary lists.
Project: 6904 (Run 2, Clone 19, Gen 84) borked WU
Posted: Sun May 27, 2012 8:15 pm
by Grandpa_01
Just got another Memory hungry WU 6904 (R2,C19,G84) it is currently using 1 core out of 48 and 22.9Gb of memory and rising this is #2 for me. viewtopic.php?f=19&t=21681
Code: Select all
[19:40:43] *------------------------------*
[19:40:43] Folding@Home Gromacs SMP Core
[19:40:43] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[19:40:43]
[19:40:43] Preparing to commence simulation
[19:40:43] - Looking at optimizations...
[19:40:43] - Created dyn
[19:40:43] - Files status OK
[19:40:47] - Expanded 45658952 -> 70963200 (decompressed 61.3 percent)
[19:40:47] Called DecompressByteArray: compressed_data_size=45658952 data_size=70963200, decompressed_data_size=70963200 diff=0
[19:40:48] - Digital signature verified
[19:40:48]
[19:40:48] Project: 6904 (Run 2, Clone 19, Gen 84)
[19:40:48]
[19:40:48] Assembly optimizations on if available.
[19:40:48] Entering M.D.
[19:40:56] Mapping NT from 48 to 48
Folding@Home Client Shutdown.
Re: 6900 crashed and dumped (run:11 clone:22 gen:76 core:0xa
Posted: Sun May 27, 2012 9:03 pm
by ChelseaOilman
08:41:59:WU03:FS00:Requesting new work unit for slot 00: RUNNING smp:8 from 130.237.232.141
08:41:59:WU03:FS00:Connecting to 130.237.232.141:8080
08:42:01:WU03:FS00:Downloading 512B
A 512 byte WU is a bad WU. I was getting them on one of my bigadv machines which is running the v6 client early Saturday morning. I emailed Dr. Kasson asking to have this fixed. He responded a while later he cleared out the bad WUs again. It keeps happening periodically though. So you have to keep watching for them.
Re: 6900 crashed and dumped (run:11 clone:22 gen:76 core:0xa
Posted: Mon May 28, 2012 12:48 am
by Ripper36
Thanks for all the responses.
Re: Merged problems with projects 6903/6904
Posted: Mon May 28, 2012 3:28 pm
by d00dz
I just had issues with P6904 (R2 C19 G84)
It wouldnt move past the Mapping stage.
One of those bad wus?
Re: Project: 6904 (Run 2, Clone 19, Gen 84) borked WU
Posted: Mon May 28, 2012 3:30 pm
by d00dz
I got the same one too..
P6904 (R2 C19 G84)
I noticed using top the core was only using 1 thread.
Re: Project: 6904 (Run 2, Clone 19, Gen 84) borked WU
Posted: Mon May 28, 2012 9:10 pm
by toTOW
I marked the WU as a bad one, so as Project: 6904 (Run 2, Clone 26, Gen 86) which has been reported by two teammates ...
Re: Merged problems with projects 6903/6904
Posted: Mon May 28, 2012 11:25 pm
by bruce
I have been in contact with the owner of these projects. Within the past few days there have been 6 or 7 of these WUs reported as problems, which probably means there are others that will soon be noticed. I've merged them all into a single topic. You'll also note that a similar problem happened a couple months ago and after considerable manual work, the corrupt WUs were regenerated even though the original cause was not found. We can hope that it's found this time.
Merged problems with projects 6903/6904, Part 2
Posted: Tue May 29, 2012 4:35 pm
by Patriot
Finished 1 unit, uploaded and grabbed another, failed 9 times and hung with out of memory error.
Unit expanded size is ~7mB smaller than previous 6904...
System SB-E 32/64 32gb ram
Code: Select all
[04:56:47] - Shutting down core
[04:56:47]
[04:56:47] Folding@home Core Shutdown: FINISHED_UNIT
[05:02:10] CoreStatus = 64 (100)
[05:02:10] Sending work to server
[05:02:10] Project: 6904 (Run 1, Clone 26, Gen 64)
[05:02:10] + Attempting to send results [May 26 05:02:10 UTC]
[05:11:07] + Results successfully sent
[05:11:07] Thank you for your contribution to Folding@Home.
[05:11:07] + Number of Units Completed: 137
[05:31:47] - Preparing to get new work unit...
[05:31:47] Cleaning up work directory
[05:31:47] + Attempting to get work packet
[05:31:47] Passkey found
[05:31:47] - Connecting to assignment server
[05:31:48] - Successful: assigned to (130.237.232.237).
[05:31:48] + News From Folding@Home: Welcome to Folding@Home
[05:31:48] Loaded queue successfully.
[05:34:04] + Closed connections
[05:34:04]
[05:34:04] + Processing work unit
[05:34:04] Core required: FahCore_a5.exe
[05:34:04] Core found.
[05:34:04] Working on queue slot 07 [May 26 05:34:04 UTC]
[05:34:04] + Working ...
[05:34:04]
[05:34:04] *------------------------------*
[05:34:04] Folding@Home Gromacs SMP Core
[05:34:04] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[05:34:04]
[05:34:04] Preparing to commence simulation
[05:34:04] - Looking at optimizations...
[05:34:04] - Created dyn
[05:34:04] - Files status OK
[05:34:08] - Expanded 50703600 -> 65093632 (decompressed 43.6 percent)
[05:34:08] Called DecompressByteArray: compressed_data_size=50703600 data_size=65093632, decompressed_data_size=65093632 diff=0
[05:34:08] - Digital signature verified
[05:34:08]
[05:34:08] Project: 6904 (Run 0, Clone 2, Gen 99)
[05:34:08]
[05:34:08] Assembly optimizations on if available.
[05:34:08] Entering M.D.
[05:34:15] Mapping NT from 64 to 64
[05:49:54] CoreStatus = 0 (0)
[05:49:54] Sending work to server
[05:49:54] Project: 6904 (Run 0, Clone 2, Gen 99)
[05:49:54] - Error: Could not get length of results file work/wuresults_07.dat
[05:49:54] - Error: Could not read unit 07 file. Removing from queue.
[05:49:54] - Preparing to get new work unit...
[05:49:54] Cleaning up work directory
[05:49:54] + Attempting to get work packet
[05:49:54] Passkey found
[05:49:54] - Connecting to assignment server
[05:49:55] - Successful: assigned to (130.237.232.237).
[05:49:55] + News From Folding@Home: Welcome to Folding@Home
[05:49:55] Loaded queue successfully.
[05:52:14] + Closed connections
[05:52:19]
[05:52:19] + Processing work unit
[05:52:19] Core required: FahCore_a5.exe
[05:52:19] Core found.
[05:52:19] Working on queue slot 08 [May 26 05:52:19 UTC]
[05:52:19] + Working ...
[05:52:19]
[05:52:19] *------------------------------*
[05:52:19] Folding@Home Gromacs SMP Core
[05:52:19] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[05:52:19]
[05:52:19] Preparing to commence simulation
[05:52:19] - Ensuring status. Please wait.
[05:52:28] - Looking at optimizations...
[05:52:28] - Working with standard loops on this execution.
[05:52:28] - Created dyn
[05:52:28] - Files status OK
[05:52:32] - Expanded 50703600 -> 65093632 (decompressed 43.6 percent)
[05:52:32] Called DecompressByteArray: compressed_data_size=50703600 data_size=65093632, decompressed_data_size=65093632 diff=0
[05:52:32] - Digital signature verified
[05:52:32]
[05:52:32] Project: 6904 (Run 0, Clone 2, Gen 99)
[05:52:32]
[05:52:32] Entering M.D.
[05:52:39] Mapping NT from 64 to 64
[06:08:08] CoreStatus = 0 (0)
[06:08:08] Sending work to server
[06:08:08] Project: 6904 (Run 0, Clone 2, Gen 99)
[06:08:08] - Error: Could not get length of results file work/wuresults_08.dat
[06:08:08] - Error: Could not read unit 08 file. Removing from queue.
[06:08:08] - Preparing to get new work unit...
[06:08:08] Cleaning up work directory
[06:08:08] + Attempting to get work packet
[06:08:08] Passkey found
[06:08:08] - Connecting to assignment server
[06:08:09] - Successful: assigned to (130.237.232.237).
[06:08:09] + News From Folding@Home: Welcome to Folding@Home
[06:08:09] Loaded queue successfully.
[06:10:26] + Closed connections
[06:10:31]
[06:10:31] + Processing work unit
[06:10:31] Core required: FahCore_a5.exe
[06:10:31] Core found.
[06:10:31] Working on queue slot 09 [May 26 06:10:31 UTC]
[06:10:31] + Working ...
[06:10:31]
[06:10:31] *------------------------------*
[06:10:31] Folding@Home Gromacs SMP Core
[06:10:31] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[06:10:31]
[06:10:31] Preparing to commence simulation
[06:10:31] - Ensuring status. Please wait.
[06:10:41] - Looking at optimizations...
[06:10:41] - Working with standard loops on this execution.
[06:10:41] - Created dyn
[06:10:41] - Files status OK
[06:10:45] - Expanded 50703600 -> 65093632 (decompressed 43.6 percent)
[06:10:45] Called DecompressByteArray: compressed_data_size=50703600 data_size=65093632, decompressed_data_size=65093632 diff=0
[06:10:45] - Digital signature verified
[06:10:45]
[06:10:45] Project: 6904 (Run 0, Clone 2, Gen 99)
[06:10:45]
[06:10:45] Entering M.D.
[06:10:52] Mapping NT from 64 to 64
[06:26:21] CoreStatus = 0 (0)
[06:26:21] Sending work to server
[06:26:21] Project: 6904 (Run 0, Clone 2, Gen 99)
[06:26:21] - Error: Could not get length of results file work/wuresults_09.dat
[06:26:21] - Error: Could not read unit 09 file. Removing from queue.
[06:26:21] - Preparing to get new work unit...
[06:26:21] Cleaning up work directory
[06:26:21] + Attempting to get work packet
[06:26:21] Passkey found
[06:26:21] - Connecting to assignment server
[06:26:21] - Successful: assigned to (130.237.232.237).
[06:26:21] + News From Folding@Home: Welcome to Folding@Home
[06:26:21] Loaded queue successfully.
[06:28:44] + Closed connections
[06:28:49]
[06:28:49] + Processing work unit
[06:28:49] Core required: FahCore_a5.exe
[06:28:49] Core found.
[06:28:49] Working on queue slot 00 [May 26 06:28:49 UTC]
[06:28:49] + Working ...
[06:28:49]
[06:28:49] *------------------------------*
[06:28:49] Folding@Home Gromacs SMP Core
[06:28:49] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[06:28:49]
[06:28:49] Preparing to commence simulation
[06:28:49] - Ensuring status. Please wait.
[06:28:58] - Looking at optimizations...
[06:28:58] - Working with standard loops on this execution.
[06:28:58] - Created dyn
[06:28:58] - Files status OK
[06:29:02] - Expanded 50703600 -> 65093632 (decompressed 43.6 percent)
[06:29:02] Called DecompressByteArray: compressed_data_size=50703600 data_size=65093632, decompressed_data_size=65093632 diff=0
[06:29:02] - Digital signature verified
[06:29:02]
[06:29:02] Project: 6904 (Run 0, Clone 2, Gen 99)
[06:29:02]
[06:29:02] Entering M.D.
[06:29:09] Mapping NT from 64 to 64
[06:44:39] CoreStatus = 0 (0)
[06:44:39] Sending work to server
[06:44:39] Project: 6904 (Run 0, Clone 2, Gen 99)
[06:44:39] - Error: Could not get length of results file work/wuresults_00.dat
[06:44:39] - Error: Could not read unit 00 file. Removing from queue.
[06:44:39] - Preparing to get new work unit...
[06:44:39] Cleaning up work directory
[07:11:16] + Attempting to get work packet
[07:11:16] Passkey found
[07:11:16] - Connecting to assignment server
[07:11:16] - Successful: assigned to (130.237.232.237).
[07:11:16] + News From Folding@Home: Welcome to Folding@Home
[07:11:16] Loaded queue successfully.
[07:13:36] + Closed connections
[07:13:41]
[07:13:41] + Processing work unit
[07:13:41] Core required: FahCore_a5.exe
[07:13:41] Core found.
[07:13:41] Working on queue slot 01 [May 26 07:13:41 UTC]
[07:13:41] + Working ...
[07:13:41]
[07:13:41] *------------------------------*
[07:13:41] Folding@Home Gromacs SMP Core
[07:13:41] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[07:13:41]
[07:13:41] Preparing to commence simulation
[07:13:41] - Ensuring status. Please wait.
[07:13:51] - Looking at optimizations...
[07:13:51] - Working with standard loops on this execution.
[07:13:51] - Created dyn
[07:13:51] - Files status OK
[07:13:55] - Expanded 50703600 -> 65093632 (decompressed 43.6 percent)
[07:13:55] Called DecompressByteArray: compressed_data_size=50703600 data_size=65093632, decompressed_data_size=65093632 diff=0
[07:13:55] - Digital signature verified
[07:13:55]
[07:13:55] Project: 6904 (Run 0, Clone 2, Gen 99)
[07:13:55]
[07:13:55] Entering M.D.
[07:14:02] Mapping NT from 64 to 64
Folding@Home Client Shutdown.
Folding@Home Client Shutdown.
Re: Merged problems with projects 6903/6904
Posted: Thu May 31, 2012 3:23 am
by markfw
I have the same problem (sort of) but it jast hangs, This is a stable 3930k box that has done many units, and just started this. See the log below:
[02:24:27] Work directory not found. Creating...
[02:24:27] Could not open work queue, generating new queue...
[02:24:27] - Preparing to get new work unit...
[02:24:27] Cleaning up work directory
[02:24:27] + Attempting to get work packet
[02:24:27] Passkey found
[02:24:27] - Connecting to assignment server
[02:24:27] - Successful: assigned to (130.237.232.237).
[02:24:27] + News From Folding@Home: Welcome to Folding@Home
[02:24:28] Loaded queue successfully.
[02:24:29] + Closed connections
[02:24:29]
[02:24:29] + Processing work unit
[02:24:29] Core required: FahCore_a5.exe
[02:24:29] Core found.
[02:24:29] Working on queue slot 01 [May 31 02:24:29 UTC]
[02:24:29] + Working ...
[02:24:29]
[02:24:29] *------------------------------*
[02:24:29] Folding@Home Gromacs SMP Core
[02:24:29] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[02:24:29]
[02:24:29] Preparing to commence simulation
[02:24:29] - Looking at optimizations...
[02:24:29] - Created dyn
[02:24:29] - Files status OK
[02:24:29] - Expanded 8610 -> 4165632 (decompressed 48381.3 percent)
[02:24:29] Called DecompressByteArray: compressed_data_size=8610 data_size=4165632, decompressed_data_size=4165632 diff=0
[02:24:29] - Digital signature verified
[02:24:29]
[02:24:29] Project: 6901 (Run 5, Clone 0, Gen 227)
[02:24:29]
[02:24:29] Assembly optimizations on if available.
[02:24:29] Entering M.D.
[02:24:35] Mapping NT from 12 to 12
[02:39:27] CoreStatus = 0 (0)
[02:39:27] Sending work to server
[02:39:27] Project: 6901 (Run 5, Clone 0, Gen 227)
[02:39:27] - Error: Could not get length of results file work/wuresults_01.dat
[02:39:27] - Error: Could not read unit 01 file. Removing from queue.
[02:39:27] - Preparing to get new work unit...
[02:39:27] Cleaning up work directory
[02:39:27] + Attempting to get work packet
[02:39:27] Passkey found
[02:39:27] - Connecting to assignment server
[02:39:28] - Successful: assigned to (130.237.232.237).
[02:39:28] + News From Folding@Home: Welcome to Folding@Home
[02:39:28] Loaded queue successfully.
[02:39:29] + Closed connections
[02:39:34]
[02:39:34] + Processing work unit
[02:39:34] Core required: FahCore_a5.exe
[02:39:34] Core found.
[02:39:34] Working on queue slot 02 [May 31 02:39:34 UTC]
[02:39:34] + Working ...
[02:39:34]
[02:39:34] *------------------------------*
[02:39:34] Folding@Home Gromacs SMP Core
[02:39:34] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[02:39:34]
[02:39:34] Preparing to commence simulation
[02:39:34] - Ensuring status. Please wait.
[02:39:44] - Looking at optimizations...
[02:39:44] - Working with standard loops on this execution.
[02:39:44] - Created dyn
[02:39:44] - Files status OK
[02:39:44] - Expanded 8610 -> 4165632 (decompressed 48381.3 percent)
[02:39:44] Called DecompressByteArray: compressed_data_size=8610 data_size=4165632, decompressed_data_size=4165632 diff=0
[02:39:44] - Digital signature verified
[02:39:44]
[02:39:44] Project: 6901 (Run 5, Clone 0, Gen 227)
[02:39:44]
[02:39:44] Entering M.D.
[02:39:50] Mapping NT from 12 to 12
[02:54:39] CoreStatus = 0 (0)
[02:54:39] Sending work to server
[02:54:39] Project: 6901 (Run 5, Clone 0, Gen 227)
[02:54:39] - Error: Could not get length of results file work/wuresults_02.dat
[02:54:39] - Error: Could not read unit 02 file. Removing from queue.
[02:54:39] - Preparing to get new work unit...
[02:54:39] Cleaning up work directory
[02:54:39] + Attempting to get work packet
[02:54:39] Passkey found
[02:54:39] - Connecting to assignment server
[02:54:39] - Successful: assigned to (130.237.232.237).
[02:54:39] + News From Folding@Home: Welcome to Folding@Home
[02:54:40] Loaded queue successfully.
[02:54:41] + Closed connections
[02:54:46]
[02:54:46] + Processing work unit
[02:54:46] Core required: FahCore_a5.exe
[02:54:46] Core found.
[02:54:46] Working on queue slot 03 [May 31 02:54:46 UTC]
[02:54:46] + Working ...
[02:54:46]
[02:54:46] *------------------------------*
[02:54:46] Folding@Home Gromacs SMP Core
[02:54:46] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[02:54:46]
[02:54:46] Preparing to commence simulation
[02:54:46] - Ensuring status. Please wait.
[02:54:56] - Looking at optimizations...
[02:54:56] - Working with standard loops on this execution.
[02:54:56] - Created dyn
[02:54:56] - Files status OK
[02:54:56] - Expanded 8610 -> 4165632 (decompressed 48381.3 percent)
[02:54:56] Called DecompressByteArray: compressed_data_size=8610 data_size=4165632, decompressed_data_size=4165632 diff=0
[02:54:56] - Digital signature verified
[02:54:56]
[02:54:56] Project: 6901 (Run 5, Clone 0, Gen 227)
[02:54:56]
[02:54:56] Entering M.D.
[02:55:02] Mapping NT from 12 to 12
[03:09:51] CoreStatus = 0 (0)
[03:09:51] Sending work to server
[03:09:51] Project: 6901 (Run 5, Clone 0, Gen 227)
[03:09:51] - Error: Could not get length of results file work/wuresults_03.dat
[03:09:51] - Error: Could not read unit 03 file. Removing from queue.
[03:09:51] - Preparing to get new work unit...
[03:09:51] Cleaning up work directory
[03:09:51] + Attempting to get work packet
[03:09:51] Passkey found
[03:09:51] - Connecting to assignment server
[03:09:51] - Successful: assigned to (130.237.232.237).
[03:09:51] + News From Folding@Home: Welcome to Folding@Home
[03:09:51] Loaded queue successfully.
[03:09:52] + Closed connections
[03:09:57]
[03:09:57] + Processing work unit
[03:09:57] Core required: FahCore_a5.exe
[03:09:57] Core found.
[03:09:57] Working on queue slot 04 [May 31 03:09:57 UTC]
[03:09:57] + Working ...
[03:09:58]
[03:09:58] *------------------------------*
[03:09:58] Folding@Home Gromacs SMP Core
[03:09:58] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[03:09:58]
[03:09:58] Preparing to commence simulation
[03:09:58] - Ensuring status. Please wait.
[03:10:07] - Looking at optimizations...
[03:10:07] - Working with standard loops on this execution.
[03:10:07] - Created dyn
[03:10:07] - Files status OK
[03:10:07] - Expanded 8610 -> 4165632 (decompressed 48381.3 percent)
[03:10:07] Called DecompressByteArray: compressed_data_size=8610 data_size=4165632, decompressed_data_size=4165632 diff=0
[03:10:07] - Digital signature verified
[03:10:07]
[03:10:07] Project: 6901 (Run 5, Clone 0, Gen 227)
[03:10:07]
[03:10:07] Entering M.D.
[03:10:14] Mapping NT from 12 to 12
Re: Merged problems with projects 6903/6904
Posted: Fri Jun 01, 2012 3:39 pm
by kasson
We've been periodically tagging and removing the 512-byte WU's. We also just ran a cleanup script to remove any of the WU's that have the wrong # of steps in projects 6901/6903/6904. We're running the scripts on 6900 now.
Re: Merged problems with projects 6903/6904
Posted: Fri Jun 01, 2012 6:38 pm
by markfw
When will you get to Project: 6901 (Run 5, Clone 0, Gen 227) ? I keep deleting my queue, and it still keeps giving it to me. 3 days of no work, at 125 ppd is a waste of computer time. I have a 3930k @ 4.3 ghz
Re: Merged problems with projects 6903/6904
Posted: Fri Jun 01, 2012 6:55 pm
by kasson
Stopped. BTW the v7 client will report these properly so you get fewer re-assigns.