We are working on two servers right now:
171.64.65.83
128.59.74.4
We don't have an ETA, but I hope both will be back up later today.
Two servers down -- work in progress
Moderators: Site Moderators, FAHC Science Team
-
- Pande Group Member
- Posts: 2058
- Joined: Fri Nov 30, 2007 6:25 am
- Location: Stanford
Two servers down -- work in progress
Prof. Vijay Pande, PhD
Departments of Chemistry, Structural Biology, and Computer Science
Chair, Biophysics
Director, Folding@home Distributed Computing Project
Stanford University
Departments of Chemistry, Structural Biology, and Computer Science
Chair, Biophysics
Director, Folding@home Distributed Computing Project
Stanford University
Re: Two servers down -- work in progress
Unfortunately, it looks like there are RAID problems with 128.59.74.4. The raid will need to be rebuilt, which will likely take a few days. It's possible jobs can be accepted before then with the correct configuration, but I can't say for sure. I will continue to post on this as more information comes in.
128.59.74.4 has been a troubled server, and it will be retired after the RAID array is back up and it and accepts the jobs that are outstanding.
128.59.74.4 has been a troubled server, and it will be retired after the RAID array is back up and it and accepts the jobs that are outstanding.
-
- Site Moderator
- Posts: 6365
- Joined: Sun Dec 02, 2007 10:38 am
- Location: Bordeaux, France
- Contact:
Re: Two servers down -- work in progress
mrshirts wrote:Update on 128.59.74.4:
The good news, the all data (2 TB) is safe. I was able to rebuild and mount the raid. The bad news is, the server won't boot normally. Since it's actually at Columbia (where I don't work anymore), I'm a bit at the mercy of the IT support staff there in terms of getting it up and running again. The current plan is therefore to copy the data off that is needed to continue the projects, and try to relay the IP to a different machine, putting it into accept only mode. Time line is probably going to be about a week, unfortunately.
-
- Site Moderator
- Posts: 6365
- Joined: Sun Dec 02, 2007 10:38 am
- Location: Bordeaux, France
- Contact:
Re: Two servers down -- work in progress
mrshirts wrote:Latest update: another machine should be in place on Monday at Columbia, so I should be able to forward WU to a working server at that point. I'll post again on Monday with an update. Apologies for the inconvenience -- 128.59.74.4 will of course be retired, and things should work more smoothly here at U Va, since I'll be directly maintaining the machines.
Re: Two servers down -- work in progress
mrshirts wrote:The IT guy at Columbia has a backload, so the computer to relay the WU is not up yet. I'll keep updating as more information comes available. Since the timeout is long (60 days), hopefully most people will not end up affected in the end.
-=MB=-
-
- Site Moderator
- Posts: 6365
- Joined: Sun Dec 02, 2007 10:38 am
- Location: Bordeaux, France
- Contact:
Re: Two servers down -- work in progress
mrshirts wrote:Good news! Although we were not able to bring the machine up at Columbia, we set up another machine and I configured it to forward the traffic to the new Virginia machine. I can see the WUs rolling in as I speak now. All the old data is preserved as well.
It will take a little bit longer to set it up so that the credits are properly put into the database automatically, because of this port forwarding, but we'll get it taken care of. At this point, the WUs are being accepted, the stats files are being created, and they will be processed -- possibly it will take until Monday. But the major problem has been fixed.