Page 27 of 28

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Fri Feb 26, 2010 7:31 pm
by VijayPande
thanks for the report. I think we know what happened and have a temporary fix in place for now.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Sat Feb 27, 2010 1:15 am
by weedacres
VijayPande wrote:thanks for the report. I think we know what happened and have a temporary fix in place for now.
What does that mean for the wuresults files we have?

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Sat Feb 27, 2010 2:59 am
by VijayPande
weedacres wrote:
VijayPande wrote:thanks for the report. I think we know what happened and have a temporary fix in place for now.
What does that mean for the wuresults files we have?
At this point, I'm not completely sure. Joe is updating WS code and we'll see. If these WUs are still in your queue, my hope is that they will go back to the WS, once this new fix is in and the client sends them back.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Sat Feb 27, 2010 3:45 am
by weedacres
VijayPande wrote:
weedacres wrote: At this point, I'm not completely sure. Joe is updating WS code and we'll see. If these WUs are still in your queue, my hope is that they will go back to the WS, once this new fix is in and the client sends them back.
My guess here is that we're in the same boat as last week.
queue.dat I think shows the wu has successfully uploaded. Here's the qd results for one of them. 00 is the one in question:

Code: Select all

C:\GPU1>qd
qd released 23 October 2008 (fr 071)
qd executed Fri Feb 26 19:34:13 Pacific Standard Time 2010 (Sat Feb 27 03:34:13
UTC 2010)
Queue version 6.00
Current index: 7
 Index 8: finished 97.5 X min speed
  server: 171.67.108.21:8080; project: 10504
  Folding: run 201, clone 1, generation 0; benchmark 0; misc: 500, 200
  issue: Thu Feb 25 06:06:52 2010; begin: Thu Feb 25 06:06:58 2010
  end: Thu Feb 25 09:04:12 2010; due: Tue Mar 09 06:06:58 2010 (12 days)
  core URL: http://www.stanford.edu/~pande/Win32/x86/NVIDIA/G80/Core_11.fah
  CPU: 1,687 Pentium II/III; OS: 1,8 WinXP
  flops: 1064956945 (1064.956945 megaflops)
  memory: 3070 MB; gpu memory: 258 MB
  assignment info (be): Thu Feb 25 06:06:38 2010; B8436B84
  CS: 171.67.108.26; P limit: 524286976
  user: weedacres_gpu; team: 52523; ID: 3E34899A69C37D0C; mach ID: 33554432
  work/wudata_08.dat file size: 59546; WU type: Folding@Home
 Index 9: finished 97.6 X min speed
  server: 171.67.108.21:8080; project: 10503
  Folding: run 211, clone 1, generation 0; benchmark 0; misc: 500, 200
  issue: Thu Feb 25 09:04:14 2010; begin: Thu Feb 25 09:04:20 2010
  end: Thu Feb 25 12:01:18 2010; due: Tue Mar 09 09:04:20 2010 (12 days)
  core URL: http://www.stanford.edu/~pande/Win32/x86/NVIDIA/G80/Core_11.fah
  CPU: 1,687 Pentium II/III; OS: 1,8 WinXP
  flops: 1065001784 (1065.001784 megaflops)
  memory: 3070 MB; gpu memory: 258 MB
  assignment info (be): Thu Feb 25 09:04:00 2010; B84341EA
  CS: 171.67.108.26; P limit: 524286976
  user: weedacres_gpu; team: 52523; ID: 3E34899A69C37D0C; mach ID: 33554432
  work/wudata_09.dat file size: 59647; WU type: Folding@Home
 Index 0: finished 97.6 X min speed
  server: 171.67.108.21:8080; project: 10501
  Folding: run 274, clone 1, generation 0; benchmark 0; misc: 500, 200
  issue: Thu Feb 25 12:01:23 2010; begin: Thu Feb 25 12:01:28 2010
  end: Thu Feb 25 14:58:26 2010; due: Tue Mar 09 12:01:28 2010 (12 days)
  core URL: http://www.stanford.edu/~pande/Win32/x86/NVIDIA/G80/Core_11.fah (V1.
31)
  CPU: 1,687 Pentium II/III; OS: 1,8 WinXP
  flops: 1065037707 (1065.037707 megaflops)
  memory: 3070 MB; gpu memory: 258 MB
  assignment info (be): Thu Feb 25 12:01:09 2010; B843B86F
  CS: 171.67.108.26; P limit: 524286976
  user: weedacres_gpu; team: 52523; ID: 3E34899A69C37D0C; mach ID: 33554432
  work/wudata_00.dat file size: 59657; WU type: Folding@Home
 Index 1: finished 152 X min speed
  server: 171.67.108.21:8080; project: 5781
  Folding: run 19, clone 289, generation 4; benchmark 0; misc: 500, 200
  issue: Thu Feb 25 14:58:29 2010; begin: Thu Feb 25 14:58:34 2010
  end: Thu Feb 25 18:55:33 2010; due: Mon Mar 22 15:58:34 2010 (25 days)
  core URL: http://www.stanford.edu/~pande/Win32/x86/NVIDIA/G80/Core_11.fah
  CPU: 1,687 Pentium II/III; OS: 1,8 WinXP
  flops: 1065066445 (1065.066445 megaflops)
  memory: 3070 MB; gpu memory: 258 MB
  assignment info (be): Thu Feb 25 14:58:15 2010; B84396ED
  CS: 171.67.108.26; P limit: 524286976
  user: weedacres_gpu; team: 52523; ID: 3E34899A69C37D0C; mach ID: 33554432
  work/wudata_01.dat file size: 65516; WU type: Folding@Home
 Index 2: finished 43 X min speed
  server: 171.67.108.11:8080; project: 5767
  Folding: run 5, clone 128, generation 2072; benchmark 0; misc: 500, 200
  issue: Thu Feb 25 18:55:40 2010; begin: Thu Feb 25 18:55:45 2010
  end: Thu Feb 25 20:36:19 2010; due: Sun Feb 28 18:55:45 2010 (3 days)
  core URL: http://www.stanford.edu/~pande/Win32/x86/NVIDIA/G80/Core_11.fah
  CPU: 1,687 Pentium II/III; OS: 1,8 WinXP
  tag: P5767R5C128G2072
  flops: 1065101711 (1065.101711 megaflops)
  memory: 3070 MB; gpu memory: 258 MB
  assignment info (le): Thu Feb 25 18:55:26 2010; B843DF6A
  CS: 171.67.108.25; P limit: 524286976
  user: weedacres_gpu; team: 52523; ID: 0C7DC3699A89343E; mach ID: 2
  work/wudata_02.dat file size: 47094; WU type: Folding@Home
 Index 3: finished 27.8 X min speed
  server: 171.64.65.20:8080; project: 5910
  Folding: run 9, clone 317, generation 28; benchmark 0; misc: 500, 200
  issue: Thu Feb 25 20:36:29 2010; begin: Thu Feb 25 20:36:36 2010
  end: Thu Feb 25 23:11:54 2010; due: Sun Feb 28 20:36:36 2010 (3 days)
  core URL: http://www.stanford.edu/~pande/Win32/x86/NVIDIA/G80/Core_14.fah
  CPU: 1,687 Pentium II/III; OS: 1,8 WinXP
  tag: P5910R9C317G28
  flops: 1065073900 (1065.073900 megaflops)
  memory: 3070 MB; gpu memory: 258 MB
  assignment info (le): Thu Feb 25 20:36:16 2010; B8430AAB
  CS: 171.65.103.100; P limit: 524286976
  user: weedacres_gpu; team: 52523; ID: 0C7DC3699A89343E; mach ID: 2
  work/wudata_03.dat file size: 70676; WU type: Folding@Home
 Index 4: finished 27.8 X min speed
  server: 171.64.65.20:8080; project: 5910
  Folding: run 12, clone 75, generation 36; benchmark 0; misc: 500, 200
  issue: Thu Feb 25 23:11:56 2010; begin: Thu Feb 25 23:12:00 2010
  end: Fri Feb 26 01:47:29 2010; due: Sun Feb 28 23:12:00 2010 (3 days)
  core URL: http://www.stanford.edu/~pande/Win32/x86/NVIDIA/G80/Core_14.fah
  CPU: 1,687 Pentium II/III; OS: 1,8 WinXP
  tag: P5910R12C75G36
  flops: 1065009138 (1065.009138 megaflops)
  memory: 3070 MB; gpu memory: 258 MB
  assignment info (le): Thu Feb 25 23:11:42 2010; B8433645
  CS: 171.65.103.100; P limit: 524286976
  user: weedacres_gpu; team: 52523; ID: 0C7DC3699A89343E; mach ID: 2
  work/wudata_04.dat file size: 70711; WU type: Folding@Home
 Index 5: finished 13.8 X min speed
  server: 171.64.65.20:8080; project: 5915
  Folding: run 7, clone 987, generation 14; benchmark 0; misc: 500, 200
  issue: Fri Feb 26 01:47:30 2010; begin: Fri Feb 26 01:47:35 2010
  end: Fri Feb 26 12:12:31 2010; due: Thu Mar 04 01:47:35 2010 (6 days)
  core URL: http://www.stanford.edu/~pande/Win32/x86/NVIDIA/G80/Core_14.fah
  CPU: 1,687 Pentium II/III; OS: 1,8 WinXP
  tag: P5915R7C987G14
  flops: 1064957186 (1064.957186 megaflops)
  memory: 3070 MB; gpu memory: 258 MB
  assignment info (le): Fri Feb 26 01:47:17 2010; B84353CE
  CS: 171.65.103.100; P limit: 524286976
  user: weedacres_gpu; team: 52523; ID: 0C7DC3699A89343E; mach ID: 2
  work/wudata_05.dat file size: 69125; WU type: Folding@Home
 Index 6: finished 27.8 X min speed
  server: 171.64.65.20:8080; project: 5910
  Folding: run 9, clone 178, generation 43; benchmark 0; misc: 500, 200
  issue: Fri Feb 26 12:12:35 2010; begin: Fri Feb 26 12:12:39 2010
  end: Fri Feb 26 14:48:03 2010; due: Mon Mar 01 12:12:39 2010 (3 days)
  core URL: http://www.stanford.edu/~pande/Win32/x86/NVIDIA/G80/Core_14.fah
  CPU: 1,687 Pentium II/III; OS: 1,8 WinXP
  tag: P5910R9C178G43
  flops: 1064793692 (1064.793692 megaflops)
  memory: 3070 MB; gpu memory: 258 MB
  assignment info (le): Fri Feb 26 12:12:21 2010; B843E14E
  CS: 171.65.103.100; P limit: 524286976
  user: weedacres_gpu; team: 52523; ID: 0C7DC3699A89343E; mach ID: 2
  work/wudata_06.dat file size: 70731; WU type: Folding@Home
 Index 7: folding now 13.9 X min speed; 46% complete
  server: 171.64.65.20:8080; project: 5915
  Folding: run 14, clone 974, generation 33; benchmark 0; misc: 500, 200
  issue: Fri Feb 26 14:48:06 2010; begin: Fri Feb 26 14:48:09 2010
  expect: Sat Feb 27 01:09:55 2010; due: Thu Mar 04 14:48:09 2010 (6 days)
  core URL: http://www.stanford.edu/~pande/Win32/x86/NVIDIA/G80/Core_14.fah (V1.
26)
  CPU: 1,687 Pentium II/III; OS: 1,8 WinXP
  tag: P5915R14C974G33
  flops: 1064784894 (1064.784894 megaflops)
  memory: 3070 MB; gpu memory: 258 MB
  assignment info (le): Fri Feb 26 14:47:52 2010; B8420AF3
  CS: 171.65.103.100; P limit: 524286976
  user: weedacres_gpu; team: 52523; ID: 0C7DC3699A89343E; mach ID: 2
  work/wudata_07.dat file size: 69054; WU type: Folding@Home
Results successfully sent: Fri Feb 19 13:51:59 2010
Average download rate 85.167 KB/s (u=4); upload rate 41.451 KB/s (u=4)
Performance fraction 0.966125 (u=4)
If I recall correctly, it will show something line Awaiting Upload, or words to that effect, if it knows the upload needs to be done.
There have been several auto send events with no attempt to upload the file.

I may be wrong but it seems to me that all work unit uploads are dependent on the accuracy of queue.dat. Given all of the problems we've had with this, wouldn't it make sense to upload all wuresults_xx files regardless of what's in queue.dat and let the Stanford end sort out if it's previously been uploaded or not? Or at least develop tools that will allow us to fix queue.dat. We've shown that qfix will not do the job on the Nvidia gpu client.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Sat Feb 27, 2010 12:18 pm
by glussier
VijayPande wrote:
weedacres wrote:
VijayPande wrote:thanks for the report. I think we know what happened and have a temporary fix in place for now.
What does that mean for the wuresults files we have?
At this point, I'm not completely sure. Joe is updating WS code and we'll see. If these WUs are still in your queue, my hope is that they will go back to the WS, once this new fix is in and the client sends them back.
After I receive the message [22:45:16] - Server reports problem with unit. the wuresults file is not there anymore. If, for some, the results files are still there after getting the above message, there might be more than 1 problem.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Sat Mar 06, 2010 12:30 pm
by SnW
Is this back again ?
GPU WU's not uploading.......

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Sat Mar 06, 2010 12:34 pm
by noorman
.

171.67.108.21 GPU vsp07b vvoelz full Reject

just sent a message about it to Vijay Pande

.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Sat Mar 06, 2010 2:19 pm
by Nathan_P
SnW wrote:Is this back again ?
GPU WU's not uploading.......
It looks like it, i have got a couple of WU that won't upload - if they don't get it sorrted soon the WU will be overwritten as my GPU's are racing through a batch of 353 pointers at the moment. I wouldn't mind but one of them is a gen 0 WU which will delay the startof that line of research :(

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Sat Mar 06, 2010 3:23 pm
by noorman
.

I don't know the current workings of the server software, but normally the Collection Server should take over and gather the data that cannot be uploaded to the WS from which it originated ...

.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Sat Mar 06, 2010 3:27 pm
by SnW
Maybe i should be a bit more patients :oops:
All GPU WU's are now uploaded , thanks guys for reading "fix".. if there was any 8-)

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Sat Mar 06, 2010 3:32 pm
by bollix47
noorman wrote:.

I don't know the current workings of the server software, but normally the Collection Server should take over and gather the data that cannot be uploaded to the WS from which it originated ...

.
Unfortunately the CS for WS 108.21 is 108.26 and it has been in REJECT since Feb 23. :roll:

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Sat Mar 06, 2010 3:53 pm
by noorman
SnW wrote:Maybe i should be a bit more patients :oops:
All GPU WU's are now uploaded , thanks guys for reading "fix".. if there was any 8-)
.

No, no, that server was in Reject mode and has since been given a boot up the proverbial and is now back to the normal 'Accepting' status.
That 's why the WU results will now have been uploaded :D

Your message seems to confirm that this is the case !

.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Sat Mar 06, 2010 3:55 pm
by noorman
bollix47 wrote:
noorman wrote:.

I don't know the current workings of the server software, but normally the Collection Server should take over and gather the data that cannot be uploaded to the WS from which it originated ...

.
Unfortunately the CS for WS 108.21 is 108.26 and it has been in REJECT since Feb 23. :roll:
.

That explains it / I didn't know which CS was the one to accept uploads for those projects.
Is there any specific info about what CS does accept what to be found and if so, where, please ?

.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Sat Mar 06, 2010 4:00 pm
by VijayPande
We reset vsp07b this morning. It may take about an hour to recover due to high load.

We've turned off the CS about a week ago since it seems to be causing more problems with WUs than helping right now. Joe is looking into what's up.

Re: GPU server status 171.67.108.21, 171.64.65.71,171.67.108.26

Posted: Sat Mar 06, 2010 4:00 pm
by SnW
noorman wrote:
SnW wrote:Maybe i should be a bit more patients :oops:
All GPU WU's are now uploaded , thanks guys for reading "fix".. if there was any 8-)
.

No, no, that server was in Reject mode and has since been given a boot up the proverbial and is now back to the normal 'Accepting' status.
That 's why the WU results will now have been uploaded :D

Your message seems to confirm that this is the case !

.
Aaa K thanks for that confirm mate :D
i don't want to sound like a winning idiot K i do ..but you get my drift :wink: