Page 1 of 1

Project: 2621 (Run 24, Clone 72, Gen 17) Memory allocation

Posted: Mon Nov 10, 2008 5:55 am
by daveb
I noticed that after this unit started running on my old laptop, nothing happened for over 30 minutes after the initial "Writing local files". Also, the computer seemed to be getting unresponsive before I stopped it. This computer normally runs p2621's at <9 minutes/%.

Code: Select all

[04:27:03] Loaded queue successfully.
[04:27:03] + Benchmarking ...
[04:27:06] The benchmark result is 6392
[04:27:06] 
[04:27:06] + Processing work unit
[04:27:06] Core required: FahCore_78.exe
[04:27:06] Core found.
[04:27:06] Working on Unit 05 [November 10 04:27:06]
[04:27:06] + Working ...
[04:27:06] - Calling 'FahCore_78.exe -dir work/ -suffix 05 -checkpoint 15 -verbose -lifeline 164 -version 502'

[04:27:06] - Autosending finished units...
[04:27:06] Trying to send all finished work units
[04:27:06] + No unsent completed units remaining.
[04:27:06] - Autosend completed
[04:27:06] 
[04:27:06] *------------------------------*
[04:27:06] Folding@Home Gromacs Core
[04:27:06] Version 1.90 (March 8, 2006)
[04:27:06] 
[04:27:06] Preparing to commence simulation
[04:27:06] - Looking at optimizations...
[04:27:06] - Files status OK
[04:27:14] - Expanded 647924 -> 13934953 (decompressed 2150.7 percent)
[04:27:15] - Starting from initial work packet
[04:27:15] 
[04:27:15] Project: 2621 (Run 24, Clone 72, Gen 17)
[04:27:15] 
[04:27:22] Assembly optimizations on if available.
[04:27:22] Entering M.D.
[04:27:30] Protein: p2621_p1475_tet1_03_1 t= 20000.00000
[04:27:30] 
[04:27:30] Writing local files
[04:58:33] ***** Got a SIGTERM signal (2)

Folding@Home Client Shutdown.
I then tried to shift the unit to a PowerMac G5. It also never got past writing local files before it started generating errors.

Code: Select all

[05:13:50] 
[05:13:50] *------------------------------*
[05:13:50] Folding@Home Gromacs Core
[05:13:50] Version 1.90 (March 8, 2006)
[05:13:50] 
[05:13:50] Preparing to commence simulation
[05:13:50] - Looking at optimizations...
[05:13:50] - Files status OK
[05:13:50] - Expanded 647924 -> 13934953 (decompressed 2150.7 percent)
[05:13:51] - Checksums don't match (work/wudata_05.bed)
[05:13:51] - Starting from initial work packet
[05:13:51] 
[05:13:51] Project: 2621 (Run 24, Clone 72, Gen 17)
[05:13:51] 
[05:13:51] Assembly optimizations on if available.
[05:13:51] Entering M.D.

  Gromacs is Copyright (c) 1991-2003, University of Groningen, The Netherlands
        This inclusion of Gromacs code in the Folding@Home Core is under
        a special license (see http://folding.stanford.edu/gromacs.html)
         specially granted to Stanford by the copyright holders. If you
          are interested in using Gromacs, visit www.gromacs.org where
                you can download a free version of Gromacs under
         the terms of the GNU General Public License (GPL) as published
       by the Free Software Foundation; either version 2 of the License,
                     or (at your option) any later version.

[05:13:59] Protein: p2621_p1475_tet1_03_1 t= 20000.00000
[05:13:59] 
[05:13:59] Writing local files
FahCore_78.exe(8277,0xf0081000) malloc: *** mmap(size=2079457280) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Fatal error: realloc for nlist->jjnr (2079457280 bytes, file ns.c, line 388, nlist->jjnr=0x0x13eec000): Cannot allocate memory
[05:16:27] Gromacs error.
[05:16:27] 
[05:16:27] Folding@home Core Shutdown: UNKNOWN_ERROR
[05:17:05] CoreStatus = 79 (121)
[05:17:05] Client-core communications error: ERROR 0x79
[05:17:05] Deleting current work unit & continuing...
^C[05:17:16] ***** Got an Activate signal (2)
Again, the computer became unresponsive, staying that way for 20-30 sec after generating the errors. I tried running the unit again on the G5 while watching the Activity Monitor. The process memory allocation kept increasing as I watched it, reaching over 1GB before I manually terminated it. As a comparison, there is presently a p2606 running on this machine with a memory allocation of ~80MB.

I then tried running it again on the laptop while watching the memory allocation using the Task Manager. The reported memory allocation kept oscillating from ~100MB to ever larger numbers until I manually terminated the process after it reported >400 MB (vs a normal ~100MB on this machine).

Unfortunately, since the PC does not generate an EUE in a reasonable amount of time (if ever), and the Mac generates an Error 0x79 before deleting the unit, I don't believe that this problem will get back to the server under normal conditions.

Dave

Re: Project: 2621 (Run 24, Clone 72, Gen 17) Memory allocation

Posted: Mon Nov 10, 2008 2:01 pm
by toTOW
This is a bad WU, thanks for the report.

The PC might generate an error 0x79 to, but it can takes up to an hour depending on how many RAM you have in your system.

Re: Project: 2621 (Run 24, Clone 72, Gen 17) Memory allocation

Posted: Sun Nov 16, 2008 4:06 pm
by daveb
I just got this unit again yesterday, with exactly the same result.
I also noted that this unit has a smaller than normal download (~650kB vs the usual ~3.6MB). I have seen a number of other reports of problems with various units recently which have all shown smaller than normal downloads.
http://foldingforum.org/viewtopic.php?f=19&t=6909
http://foldingforum.org/viewtopic.php?f=47&t=6400
http://foldingforum.org/viewtopic.php?f=19&t=6946

Dave