Project 6318: Collection server misconfigured?

Moderators: Site Moderators, FAHC Science Team

bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project 6318: Collection server misconfigured?

Post by bruce »

Last credit Last returned WU Proj Run Clone Gen
336 2010-01-31 16:07:20 6318 3096 86 1
905 2010-01-31 18:04:12 2495 70 24 0
336 2010-01-31 20:09:29 6318 625 83 1
56928.1 2010-02-01 09:08:00 2681 10 4 68
336 2010-02-01 14:35:02 6318 2817 97 1
1920 2010-02-01 14:40:40 2677 1 1 75

It looks like your 6318's are being credited, and your 2681 did get bonus credit. [times are PST -- Stanford's time zone]
k1wi
Posts: 909
Joined: Tue Sep 22, 2009 10:48 pm

Re: Project 6318: Collection server misconfigured?

Post by k1wi »

Thanks Bruce...

I guess with so many clients working over such a long period it's almost impossible to correlate mine... Two of those listed were on my list of completed work units for this computer except the Gen is wrong, 6318 (Run 1729, Clone 42, Gen 0) [January 30 02:46:04 UTC] and 6318 (Run 625, Clone 83, Gen 0) [February 1 03:11:05 UTC]

How does Gen work? I note that the second one above (Run 625, Clone 83, Gen 0) is similar to the the one in the list you which is Gen 1 instead. My maths might be wrong, but the one listed above was received an hour before my computer returned the Gen 1. (PST is 8 hours behind UTC IIRC and 20:09 is 7 hours behind 03:11?)

I've also found the first one you listed on my other classic client that I run here... Once again the Gen's an 0 instead of a 1. Of course, this is all complicated by the fact I also run classic systray and console client at university, on a much faster computer, which I can't check until the morning NZ time. (I'll repost once I have)

Hopefully I'm not being too much of a pain on this, I just find it odd that my other teammates haven't received their points!
Image
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project 6318: Collection server misconfigured?

Post by bruce »

Yes, Stanford time is GMT-8 in the Northern winter (and GMT-7 in the Northern summer). The times are when the WU is credited, not when it's uploaded. Generally speaking, data for all of the WUs which are uploaded during one hour are collected at the end of the hour and added to the database during the nexe hour. Sometimes there may be additional delays, such as this past weekend

I can't explain why your reports say a different Gen was being processed than what I reported.

For most projects, there is a sequence of time-segments for the same set of atoms. The atom positions and velocities at the end of Gen N are used as the starting point for Gen (N+1) with the same Run,Clone numbers.
k1wi
Posts: 909
Joined: Tue Sep 22, 2009 10:48 pm

Re: Project 6318: Collection server misconfigured?

Post by k1wi »

Hi once again,

I've looked through the logs of my other computer running classic clients, and my logs are still saying a different Gen - 0 instead of 1. In fact, going all the way back to December 1 there is no record of me having completed a gen 1 of any the above work units.

k1wi

Code: Select all

[21:30:03] Finished Work Unit:
[21:30:03] - Reading up to 362448 from "work/wudata_03.arc": Read 362448
[21:30:03] - Reading up to 5785300 from "work/wudata_03.xtc": Read 5785300
[21:30:03] goefile size: 0
[21:30:03] Leaving Run
[21:30:06] - Writing 7735640 bytes of core data to disk...
[21:30:08] Done: 7735128 -> 6873169 (compressed to 88.8 percent)
[21:30:09]   ... Done.
[21:30:09] - Shutting down core
[21:30:09] 
[21:30:09] Folding@home Core Shutdown: FINISHED_UNIT
[21:30:13] CoreStatus = 64 (100)
[21:30:13] Sending work to server
[21:30:13] Project: 6318 (Run 2817, Clone 97, Gen 0)
[21:30:13] - Read packet limit of 540015616... Set to 524286976.


[21:30:13] + Attempting to send results [February 1 21:30:13 UTC]
[21:30:45] + Results successfully sent
[21:30:45] Thank you for your contribution to Folding@Home.
[21:30:45] + Number of Units Completed: 133
Image
Bob8421
Posts: 53
Joined: Tue Dec 22, 2009 5:16 pm

Re: Project 6318: Collection server misconfigured?

Post by Bob8421 »

There is something terribly wrong with the collection server, even if it really is supposed to take days for work units sent there to show up. And no one has really explained why this server is so far removed from normal processing that it takes days to access it!

I submitted four 6318 work units that went to the collection server from late January 28th to early January 29th and they have not yet shown up. Later I submitted seven more 6318 work units that went to the collection server from late February 1st to early February 2nd and they also have not yet shown up.

As far as I can tell, to Pande Group "collection server" is the same as "recycle bin" because things go there and are never seen again. This project seems important enough for me to have gotten almost nothing but 6318 work units for a couple weeks, but not important enough for the work that has been returned to the collection server to be processed.

Can anyone give me a good reason for not simply deleting any future 6318 work units before they waste my processing time? And, to make things worse, these 7MB files have to be uploaded THREE TIMES before they go to the collection server (AKA nowhere). Why can't the servers decide whether they want to accept a work unit or not BEFORE it is completely uploaded rather than after? That is a terrible waste of network capacity on both ends!
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project 6318: Collection server misconfigured?

Post by bruce »

There is a known problem with p6318. I have no information about what can be done or when there might be more news.

What setting are you using for Acceptable size of work assignment and work result packets . . . (small/normal/big)?. Does it change anything if you set it to big?
AgrFan
Posts: 63
Joined: Sat Mar 15, 2008 8:07 pm

Re: Project 6318: Collection server misconfigured?

Post by AgrFan »

Points are not being credited for this project. I haven't seen any points for the past few days :(
k1wi
Posts: 909
Joined: Tue Sep 22, 2009 10:48 pm

Re: Project 6318: Collection server misconfigured?

Post by k1wi »

Just to weigh in here bruce... I don't think it matters whether it is big or normal - they all seem to end up with the collection server. I may be hypothesising, but those who haven't selected big MAY be getting the error about it being to big, (I can't remember the exact log and can't find it despite searching high and low), but they still all end up in the same purgatory :(

If there is a problem with this project, I think someone from Pande Lab should let us know, at least stop sending out more work units.

(I just noticed the project's no longer on the Psummary page..?)
Image
bruce
Posts: 20824
Joined: Thu Nov 29, 2007 10:13 pm
Location: So. Cal.

Re: Project 6318: Collection server misconfigured?

Post by bruce »

The message you're probably searching for is Read packet limit of 540015616... Set to 524286976.

Are you saying that when I say there's a known problem with the WU that you won't believe it unless you hear it from Dr. Pande?

I think that the servers have not been distributing 6318 for a while now unless you've got a certain combination of settings which do not lead to the problem. (Maybe they've pulled it entirely now.) I do know that some people have been getting credit so it has been working in some cases.

So far I don't know of anything that can be done once you've got the problem. I'm seeing a number of new reports yesterday and today, but I suspect that most of them are from WUs that were started, and perhaps even finished, before the problem was identified.
k1wi
Posts: 909
Joined: Tue Sep 22, 2009 10:48 pm

Re: Project 6318: Collection server misconfigured?

Post by k1wi »

I'm not saying that I won't believe it unless I hear it from Dr. Pande, it's just until 2 hours ago, there was no confirmation that there was a known problem (just that there might be a lag etc etc), and until then I'd gotten the distinct impression that there was acknowledgement that something funny going on but nothing like what I've been noticing as a user and a supporter of a cluster of brand new users over the past 4+ days.

I don't feel is very fair on you that you have to be the go-between between the project and the users, but I don't think there has been much in the way of openness, or communication from those who are directly involved with this project.

By the way, I received another 6318 WU half an hour ago, so they're still being sent out. The previous WU was a 6318 and had the read packet limit error, the console was set to big and the server doesn't have record of this unit.

Hopefully all the new users I brought into the project over the past week will receive their points and not feel discredited, it is quite unfortunate that the problem cropped up now as I brought them on board. Personally, I couldn't give a hoot if my points aren't awarded, so long as the science is kept (which I hope it is).

My main concern is this issue affects those who run the classic client because they're the newer of our users, or the least technically able and I would think the least likely to appear here in the forums to report a problem before just giving up. (If there was a problem with the bigadv for example, I'd be confident the forums would be ablaze)

I am appreciative of the work you're doing here and don't want to feel like I'm shooting the messenger, so I'll withdraw speaking on this issue.
Image
rklapp
Posts: 6
Joined: Thu Feb 04, 2010 8:00 am

Re: Project 6318: Collection server misconfigured?

Post by rklapp »

I have the results ready to upload. What do I do? I'm working on project 2495 now so hopefully that will go through. Thanks.

Slot 01 Done
Project: 6318 (Run 2965, Clone 23, Gen 1), Core: 78
Work server: 171.64.65.60:8080
Collection server: 171.67.108.26
Download date: February 3 07:28:16
Finished date: February 4 04:20:20
Failed uploads: 3
Tobit
Posts: 342
Joined: Thu Apr 17, 2008 2:35 pm
Location: Manchester, NH USA

Re: Project 6318: Collection server misconfigured?

Post by Tobit »

bruce wrote:I think that the servers have not been distributing 6318 for a while now unless you've got a certain combination of settings which do not lead to the problem. (Maybe they've pulled it entirely now.)
Up until last night, the server was still sending out WUs as I am 35% complete on one right now. However, I have one in the queue waiting to send but the server is down now. According to the server status page, 171.64.65.60 is in standby - non accept mode presently.
toTOW
Site Moderator
Posts: 6359
Joined: Sun Dec 02, 2007 10:38 am
Location: Bordeaux, France
Contact:

Re: Project 6318: Collection server misconfigured?

Post by toTOW »

I still see many reports of [02:22:33] - Server does not have record of this unit. Will try again later. or Signature not matching from this server/project.

Does anyone have a clue about what's going on ?
Image

Folding@Home beta tester since 2002. Folding Forum moderator since July 2008.
Pette Broad
Posts: 128
Joined: Mon Dec 03, 2007 9:38 pm
Hardware configuration: CPU folding on only one machine a laptop

GPU Hardware..
3 x 460
1 X 260
4 X 250

+ 1 X 9800GT (3 days a week)
Location: Chester U.K

Re: Project 6318: Collection server misconfigured?

Post by Pette Broad »

Yeah, I have over 30 completed units waiting to go to this server... :(


Pete
Image
rklapp
Posts: 6
Joined: Thu Feb 04, 2010 8:00 am

Re: Project 6318: Collection server misconfigured?

Post by rklapp »

It's a damn, fine waste of a good soldier...

Code: Select all

[14:08:09] + Attempting to send results [February 4 14:08:09 UTC]
[14:08:09] - Reading file work/wuresults_01.dat from core
[14:08:09]   (Read 6873339 bytes from disk)
[14:08:09] Connecting to http://171.64.65.60:8080/
[14:08:10] - Couldn't send HTTP request to server
[14:08:10] + Could not connect to Work Server (results)
[14:08:10]     (171.64.65.60:8080)
[14:08:10] + Retrying using alternative port
[14:08:10] Connecting to http://171.64.65.60:80/
[14:08:11] - Couldn't send HTTP request to server
[14:08:11] + Could not connect to Work Server (results)
[14:08:11]     (171.64.65.60:80)
[14:08:11] - Error: Could not transmit unit 01 (completed February 4) to work server.
[14:08:11] - 5 failed uploads of this unit.
[14:08:11] - Read packet limit of 540015616... Set to 524286976.


[14:08:11] + Attempting to send results [February 4 14:08:11 UTC]
[14:08:11] - Reading file work/wuresults_01.dat from core
[14:08:11]   (Read 6873339 bytes from disk)
[14:08:11] Connecting to http://171.67.108.26:8080/
[14:08:51] Posted data.
[14:08:51] Initial: 0000; - Uploaded at ~167 kB/s
[14:08:51] - Averaged speed for that direction ~129 kB/s
[14:08:51] - Server does not have record of this unit. Will try again later.
[14:08:51]   Could not transmit unit 01 to Collection server; keeping in queue.
[14:08:51] + Sent 0 of 1 completed units to the server
[14:08:51] - Autosend completed
[14:08:51] + Working...
Post Reply