140.163.0.0/16: blacklisting MSKCC via FW

Moderators: Site Moderators, FAHC Science Team

Manfred.Knick
Posts: 36
Joined: Wed Mar 25, 2020 10:21 am
Hardware configuration: Multiple XEON + GTX
Location: Germany

140.163.0.0/16: blacklisting MSKCC via FW

Post by Manfred.Knick »

After days of burning energy into wasted heat and noise by multiple other people and me,
I decided to blacklist MSKCC via my HW firewall completely.

I feel deeply sorry about this decision but can't help it -
especially with the continuously un-transparent lack of any background information provided.
This is by no means obeying the genuine habitus of the Open Source Community.

Until the problems around plfah*.mskcc.org have been proved to be solved,
I want to concentrate upon reliable productive projects.

Besides the unreasonable demand to donors,
it is also environmentally not justifiable by any means
and in my view mis-crediting the whole FAH project.

@Moderators: This is not against you - just the opposite!


REFERENCES from "Issues with a specific server" board (selection) :

"Can't upload to 140.163.4.231:80 "
- viewtopic.php?f=18&t=33978

"Another problem with 140.163.4.231"
- viewtopic.php?f=18&t=34195

"Can't upload to 140.163.4.231 again"
- viewtopic.php?f=18&t=34116

...

@Moderators: Suggestion: STICKY topic "Current Situation"
- viewtopic.php?f=18&t=34161
especially
- viewtopic.php?f=18&t=34161#p324467 <-----

Hint:
fah4.eastus.cloudapp.azure.com will be the next candidate for my FW blacklist.
iceman1992
Posts: 523
Joined: Fri Mar 23, 2012 5:16 pm

Re: 140.163.0.0/16: blacklisting MSKCC via FW

Post by iceman1992 »

So am I, but how do we know when they've been fixed and are working again?
Manfred.Knick
Posts: 36
Joined: Wed Mar 25, 2020 10:21 am
Hardware configuration: Multiple XEON + GTX
Location: Germany

Re: 140.163.0.0/16: blacklisting MSKCC via FW

Post by Manfred.Knick »

- Uptime > 5 days
- Latest mention in "specific server" > 5 days
- no Errors, no Warnings, ...
We will develop some kind of 'feeling'.
After gaining a positive Re-Test, I myself would add some kind of "Solved" to the Subject of this thread
together with a short "Successful" post, in order to let others know.

It's a real <... ehm ...> that the contributing ( and benefitting ) institutes
don't feel any obligation to actively inform their donors by themselves!
pachydermus
Posts: 17
Joined: Tue Mar 24, 2020 11:06 am

Re: 140.163.0.0/16: blacklisting MSKCC via FW

Post by pachydermus »

It's a real <... ehm ...> that the contributing ( and benefitting ) institutes
don't feel any obligation to actively inform their donors by themselves!
Let's face it, the management of this project is a complete farce. At a time when things are overloaded you'd at the very least expect some proactive monitoring of the situation but instead we get servers that crap out for days on end because nobody can be bothered to do even a basic check of their systems.
v00d00
Posts: 390
Joined: Sun Dec 02, 2007 4:53 am
Hardware configuration: FX8320e (6 cores enabled) @ stock,
- 16GB DDR3,
- Zotac GTX 1050Ti @ Stock.
- Gigabyte GTX 970 @ Stock
Debian 9.

Running GPU since it came out, CPU since client version 3.
Folding since Folding began (~2000) and ran Genome@Home for a while too.
Ran Seti@Home prior to that.
Location: UK
Contact:

Re: 140.163.0.0/16: blacklisting MSKCC via FW

Post by v00d00 »

Its a small project run by volunteers. Quite frankly they should put out an advisory saying they dont need anymore donors at present and shutdown registrations. Even with all the extra servers being donated their are far too many people folding for this size of project.

On the bright side CPU is still running fine with workunits. it just seems GPU has ceased to work.

Your best bet is to stop folding and maybe do BOINC or Rosetta. The problem will never be fixed while more and more people are coming to the project, so it will only get worse with time.
Image
Nathan_P
Posts: 1164
Joined: Wed Apr 01, 2009 9:22 pm
Hardware configuration: Asus Z8NA D6C, 2 x5670@3.2 Ghz, , 12gb Ram, GTX 980ti, AX650 PSU, win 10 (daily use)

Asus Z87 WS, Xeon E3-1230L v3, 8gb ram, KFA GTX 1080, EVGA 750ti , AX760 PSU, Mint 18.2 OS

Not currently folding
Asus Z9PE- D8 WS, 2 E5-2665@2.3 Ghz, 16Gb 1.35v Ram, Ubuntu (Fold only)
Asus Z9PA, 2 Ivy 12 core, 16gb Ram, H folding appliance (fold only)
Location: Jersey, Channel islands

Re: 140.163.0.0/16: blacklisting MSKCC via FW

Post by Nathan_P »

If you are not happy then don't run the software. You have absolutely no idea how hard people are working in the background to get additional servers online, upgrade the server software to handle an increased load, generate the work units and somehow actually do something with the data that we have generated over the last few weeks. They have precious little time to be answering all the this server is down posts. They know about it, they are working on it.
Image
v00d00
Posts: 390
Joined: Sun Dec 02, 2007 4:53 am
Hardware configuration: FX8320e (6 cores enabled) @ stock,
- 16GB DDR3,
- Zotac GTX 1050Ti @ Stock.
- Gigabyte GTX 970 @ Stock
Debian 9.

Running GPU since it came out, CPU since client version 3.
Folding since Folding began (~2000) and ran Genome@Home for a while too.
Ran Seti@Home prior to that.
Location: UK
Contact:

Re: 140.163.0.0/16: blacklisting MSKCC via FW

Post by v00d00 »

Also its Easter and im sure the people behind FAH would like to have a day of peace and quiet, to enjoy themselves away from here and spend time with their families. Given the current situation, I think they would like to enjoy time with their loved ones. Things will get back to normal after the Easter weekend. just give it some time.
Image
foldy
Posts: 2040
Joined: Sat Dec 01, 2012 3:43 pm
Hardware configuration: Folding@Home Client 7.6.13 (1 GPU slots)
Windows 7 64bit
Intel Core i5 2500k@4Ghz
Nvidia gtx 1080ti driver 441

Re: 140.163.0.0/16: blacklisting MSKCC via FW

Post by foldy »

By the way server status is shown at
https://apps.foldingathome.org/serverstats

So if a user wants to tune its FAH client it is fair enough to blacklist servers which are down. But that may change some days later.
Nathan_P
Posts: 1164
Joined: Wed Apr 01, 2009 9:22 pm
Hardware configuration: Asus Z8NA D6C, 2 x5670@3.2 Ghz, , 12gb Ram, GTX 980ti, AX650 PSU, win 10 (daily use)

Asus Z87 WS, Xeon E3-1230L v3, 8gb ram, KFA GTX 1080, EVGA 750ti , AX760 PSU, Mint 18.2 OS

Not currently folding
Asus Z9PE- D8 WS, 2 E5-2665@2.3 Ghz, 16Gb 1.35v Ram, Ubuntu (Fold only)
Asus Z9PA, 2 Ivy 12 core, 16gb Ram, H folding appliance (fold only)
Location: Jersey, Channel islands

Re: 140.163.0.0/16: blacklisting MSKCC via FW

Post by Nathan_P »

No, we should be suggesting that anyone blacklists a server. Tthat's been classed as cherry picking in the past and users have had their points wiped as a result
Image
iceman1992
Posts: 523
Joined: Fri Mar 23, 2012 5:16 pm

Re: 140.163.0.0/16: blacklisting MSKCC via FW

Post by iceman1992 »

Nathan_P wrote:No, we should be suggesting that anyone blacklists a server. Tthat's been classed as cherry picking in the past and users have had their points wiped as a result
I would say this is different, cherry-picking WUs was bad because the server has no idea that the WU is not being worked on and will wait until the WU times out before sending another one, delaying progress. This is blocking at the beginning - the server should know that it was not sent out correctly. And this was a known-to-be-problematic server with many finished WUs not accepted - which means if machines worked on the WUs many would be dumped anyway. Wouldn't the compute power be better off working on good WUs?

For anyone interested: These servers now seem to be working fine, I just uploaded successfully 2 WUs from both 140.163.4.231 and 140.163.4.241 (they were new WUs downloaded today)
v00d00
Posts: 390
Joined: Sun Dec 02, 2007 4:53 am
Hardware configuration: FX8320e (6 cores enabled) @ stock,
- 16GB DDR3,
- Zotac GTX 1050Ti @ Stock.
- Gigabyte GTX 970 @ Stock
Debian 9.

Running GPU since it came out, CPU since client version 3.
Folding since Folding began (~2000) and ran Genome@Home for a while too.
Ran Seti@Home prior to that.
Location: UK
Contact:

Re: 140.163.0.0/16: blacklisting MSKCC via FW

Post by v00d00 »

The question is, can you send back to them without them rejecting. Mine has had no issue receiving workunits from those servers, it just wont send them back.

For the sake of investigation I decided to block both servers to see what happened and straightaway the workunit uploaded and picked up new work, so maybe a temporary problem with too many people overloading it. That would suggest the issue is those servers and not the entire network of servers. I have now unblocked them. But as a temporary way of mitigating the problem I guess it could be used if you are having problems. At some point the server owner will probably troubleshoot the issue and fix it. But its unlikely to happen today given the fact this is a holiday in some parts of the world.
Image
iceman1992
Posts: 523
Joined: Fri Mar 23, 2012 5:16 pm

Re: 140.163.0.0/16: blacklisting MSKCC via FW

Post by iceman1992 »

v00d00 wrote:The question is, can you send back to them without them rejecting. Mine has had no issue receiving workunits from those servers, it just wont send them back.
As of today yes they are accepting finished units, with collection server 52.224.109.74.
v00d00 wrote:For the sake of investigation I decided to block both servers to see what happened and straightaway the workunit uploaded and picked up new work, so maybe a temporary problem with too many people overloading it. That would suggest the issue is those servers and not the entire network of servers. I have now unblocked them. But as a temporary way of mitigating the problem I guess it could be used if you are having problems. At some point the server owner will probably troubleshoot the issue and fix it. But its unlikely to happen today given the fact this is a holiday in some parts of the world.
Yeah I have unblocked them too. It would be better to have the assignment servers check for issues with the work servers and then not assign to them. I would prefer to not get WUs than to get WUs, spend electrical energy processing them, and have the WUs rejected by the server.
KimboJ
Posts: 6
Joined: Mon Apr 13, 2020 12:26 am

Re: 140.163.0.0/16: blacklisting MSKCC via FW

Post by KimboJ »

I agree to an extent. They shouldn't keep sending work out if they can't accept it back. It's unfair of the donors electricity costs, noise, hardware wear and tear. Plus the environmental effects of wasted WU.
ToeBlister
Posts: 36
Joined: Thu Mar 26, 2020 3:23 pm

Re: 140.163.0.0/16: blacklisting MSKCC via FW

Post by ToeBlister »

there is a symbiotic relationship between the donors and scientists.
the world (and donors) relies on scientists to conduct important medical studies and develop treatments for critical diseases.
the scientists rely on donors to help them crunch their massive simulations.

there must be mutual respect between the 2 parties.
donors must understand that important science is being done, servers gotten their butts kicked in and communications becomes a lower priority.
scientists too must also understand that whenever possible, establish some form of comms.

it gets frustrating when WUs is assigned and crunched but servers cannot accept the results due to bandwidth or disk space limitations as donors' money (electrical cost, hardware depre cost, etc) has been expended and will be in vain as WUs will timeout and be re-assigned.
Neil-B
Posts: 1996
Joined: Sun Mar 22, 2020 5:52 pm
Hardware configuration: 1: 2x Xeon E5-2697v3@2.60GHz, 512GB DDR4 LRDIMM, SSD Raid, Win10 Ent 20H2, Quadro K420 1GB, FAH 7.6.21
2: Xeon E3-1505Mv5@2.80GHz, 32GB DDR4, NVME, Win10 Pro 20H2, Quadro M1000M 2GB, FAH 7.6.21 (actually have two of these)
3: i7-960@3.20GHz, 12GB DDR3, SSD, Win10 Pro 20H2, GTX 750Ti 2GB, GTX 1080Ti 11GB, FAH 7.6.21
Location: UK

Re: 140.163.0.0/16: blacklisting MSKCC via FW

Post by Neil-B »

I am guessing, obviously correct me if I am wrong, that the science teams will not have in any way deliberately set up their WSs to send out WUs knowing they won't be able to get them back.

I am also guessing, again correct me if I am wrong, that the science teams may well be at least as frustrated as (if not far more so than) the folders that the current issues are losing them (or at least slowing down their receipt of) valuable science.

Finally I am also guessing, again feel free to correct (or even lambast me if it helps with getting frustration out of the system) if I am wrong, that technical issues with rapidly expanding infrastructure sometimes happen and can be both a real pain to resolve and can sometimes be found out once "things have been set in motion" and so the pain can be felt for a significant period after it is first noticed.

No one is happy with the current issues, but I firmly believe that the (incredibly small) FAH team (both technical and science) are honestly trying to improve the situation - yes, it is frustrating, but I am sure no-one within FAH means any disrespect to the folders.

It has been a stressful month or more for all, it is a holiday weekend for most, whilst FAH is important we all need to remember that the FAH team may also be contending with lockdowns, isolation, sick family/colleagues so perhaps giving everyone a breather and not overstressing might be a good way forward?

Stay Well, Stay Healthy, and look after yourselves and those close to you.
2x Xeon E5-2697v3, 512GB DDR4 LRDIMM, SSD Raid, W10-Ent, Quadro K420
Xeon E3-1505Mv5, 32GB DDR4, NVME, W10-Pro, Quadro M1000M
i7-960, 12GB DDR3, SSD, W10-Pro, GTX1080Ti
i9-10850K, 64GB DDR4, NVME, W11-Pro, RTX3070

(Green/Bold = Active)
Post Reply