Page 1 of 2

140.163.0.0/16: blacklisting MSKCC via FW

Posted: Fri Apr 10, 2020 7:37 pm
by Manfred.Knick
After days of burning energy into wasted heat and noise by multiple other people and me,
I decided to blacklist MSKCC via my HW firewall completely.

I feel deeply sorry about this decision but can't help it -
especially with the continuously un-transparent lack of any background information provided.
This is by no means obeying the genuine habitus of the Open Source Community.

Until the problems around plfah*.mskcc.org have been proved to be solved,
I want to concentrate upon reliable productive projects.

Besides the unreasonable demand to donors,
it is also environmentally not justifiable by any means
and in my view mis-crediting the whole FAH project.

@Moderators: This is not against you - just the opposite!


REFERENCES from "Issues with a specific server" board (selection) :

"Can't upload to 140.163.4.231:80 "
- viewtopic.php?f=18&t=33978

"Another problem with 140.163.4.231"
- viewtopic.php?f=18&t=34195

"Can't upload to 140.163.4.231 again"
- viewtopic.php?f=18&t=34116

...

@Moderators: Suggestion: STICKY topic "Current Situation"
- viewtopic.php?f=18&t=34161
especially
- viewtopic.php?f=18&t=34161#p324467 <-----

Hint:
fah4.eastus.cloudapp.azure.com will be the next candidate for my FW blacklist.

Re: 140.163.0.0/16: blacklisting MSKCC via FW

Posted: Sat Apr 11, 2020 8:29 am
by iceman1992
So am I, but how do we know when they've been fixed and are working again?

Re: 140.163.0.0/16: blacklisting MSKCC via FW

Posted: Sat Apr 11, 2020 12:06 pm
by Manfred.Knick
- Uptime > 5 days
- Latest mention in "specific server" > 5 days
- no Errors, no Warnings, ...
We will develop some kind of 'feeling'.
After gaining a positive Re-Test, I myself would add some kind of "Solved" to the Subject of this thread
together with a short "Successful" post, in order to let others know.

It's a real <... ehm ...> that the contributing ( and benefitting ) institutes
don't feel any obligation to actively inform their donors by themselves!

Re: 140.163.0.0/16: blacklisting MSKCC via FW

Posted: Sun Apr 12, 2020 11:47 am
by pachydermus
It's a real <... ehm ...> that the contributing ( and benefitting ) institutes
don't feel any obligation to actively inform their donors by themselves!
Let's face it, the management of this project is a complete farce. At a time when things are overloaded you'd at the very least expect some proactive monitoring of the situation but instead we get servers that crap out for days on end because nobody can be bothered to do even a basic check of their systems.

Re: 140.163.0.0/16: blacklisting MSKCC via FW

Posted: Sun Apr 12, 2020 12:24 pm
by v00d00
Its a small project run by volunteers. Quite frankly they should put out an advisory saying they dont need anymore donors at present and shutdown registrations. Even with all the extra servers being donated their are far too many people folding for this size of project.

On the bright side CPU is still running fine with workunits. it just seems GPU has ceased to work.

Your best bet is to stop folding and maybe do BOINC or Rosetta. The problem will never be fixed while more and more people are coming to the project, so it will only get worse with time.

Re: 140.163.0.0/16: blacklisting MSKCC via FW

Posted: Sun Apr 12, 2020 12:40 pm
by Nathan_P
If you are not happy then don't run the software. You have absolutely no idea how hard people are working in the background to get additional servers online, upgrade the server software to handle an increased load, generate the work units and somehow actually do something with the data that we have generated over the last few weeks. They have precious little time to be answering all the this server is down posts. They know about it, they are working on it.

Re: 140.163.0.0/16: blacklisting MSKCC via FW

Posted: Sun Apr 12, 2020 1:30 pm
by v00d00
Also its Easter and im sure the people behind FAH would like to have a day of peace and quiet, to enjoy themselves away from here and spend time with their families. Given the current situation, I think they would like to enjoy time with their loved ones. Things will get back to normal after the Easter weekend. just give it some time.

Re: 140.163.0.0/16: blacklisting MSKCC via FW

Posted: Sun Apr 12, 2020 1:37 pm
by foldy
By the way server status is shown at
https://apps.foldingathome.org/serverstats

So if a user wants to tune its FAH client it is fair enough to blacklist servers which are down. But that may change some days later.

Re: 140.163.0.0/16: blacklisting MSKCC via FW

Posted: Sun Apr 12, 2020 1:43 pm
by Nathan_P
No, we should be suggesting that anyone blacklists a server. Tthat's been classed as cherry picking in the past and users have had their points wiped as a result

Re: 140.163.0.0/16: blacklisting MSKCC via FW

Posted: Sun Apr 12, 2020 2:11 pm
by iceman1992
Nathan_P wrote:No, we should be suggesting that anyone blacklists a server. Tthat's been classed as cherry picking in the past and users have had their points wiped as a result
I would say this is different, cherry-picking WUs was bad because the server has no idea that the WU is not being worked on and will wait until the WU times out before sending another one, delaying progress. This is blocking at the beginning - the server should know that it was not sent out correctly. And this was a known-to-be-problematic server with many finished WUs not accepted - which means if machines worked on the WUs many would be dumped anyway. Wouldn't the compute power be better off working on good WUs?

For anyone interested: These servers now seem to be working fine, I just uploaded successfully 2 WUs from both 140.163.4.231 and 140.163.4.241 (they were new WUs downloaded today)

Re: 140.163.0.0/16: blacklisting MSKCC via FW

Posted: Sun Apr 12, 2020 4:08 pm
by v00d00
The question is, can you send back to them without them rejecting. Mine has had no issue receiving workunits from those servers, it just wont send them back.

For the sake of investigation I decided to block both servers to see what happened and straightaway the workunit uploaded and picked up new work, so maybe a temporary problem with too many people overloading it. That would suggest the issue is those servers and not the entire network of servers. I have now unblocked them. But as a temporary way of mitigating the problem I guess it could be used if you are having problems. At some point the server owner will probably troubleshoot the issue and fix it. But its unlikely to happen today given the fact this is a holiday in some parts of the world.

Re: 140.163.0.0/16: blacklisting MSKCC via FW

Posted: Sun Apr 12, 2020 4:32 pm
by iceman1992
v00d00 wrote:The question is, can you send back to them without them rejecting. Mine has had no issue receiving workunits from those servers, it just wont send them back.
As of today yes they are accepting finished units, with collection server 52.224.109.74.
v00d00 wrote:For the sake of investigation I decided to block both servers to see what happened and straightaway the workunit uploaded and picked up new work, so maybe a temporary problem with too many people overloading it. That would suggest the issue is those servers and not the entire network of servers. I have now unblocked them. But as a temporary way of mitigating the problem I guess it could be used if you are having problems. At some point the server owner will probably troubleshoot the issue and fix it. But its unlikely to happen today given the fact this is a holiday in some parts of the world.
Yeah I have unblocked them too. It would be better to have the assignment servers check for issues with the work servers and then not assign to them. I would prefer to not get WUs than to get WUs, spend electrical energy processing them, and have the WUs rejected by the server.

Re: 140.163.0.0/16: blacklisting MSKCC via FW

Posted: Mon Apr 13, 2020 12:49 am
by KimboJ
I agree to an extent. They shouldn't keep sending work out if they can't accept it back. It's unfair of the donors electricity costs, noise, hardware wear and tear. Plus the environmental effects of wasted WU.

Re: 140.163.0.0/16: blacklisting MSKCC via FW

Posted: Mon Apr 13, 2020 3:02 pm
by ToeBlister
there is a symbiotic relationship between the donors and scientists.
the world (and donors) relies on scientists to conduct important medical studies and develop treatments for critical diseases.
the scientists rely on donors to help them crunch their massive simulations.

there must be mutual respect between the 2 parties.
donors must understand that important science is being done, servers gotten their butts kicked in and communications becomes a lower priority.
scientists too must also understand that whenever possible, establish some form of comms.

it gets frustrating when WUs is assigned and crunched but servers cannot accept the results due to bandwidth or disk space limitations as donors' money (electrical cost, hardware depre cost, etc) has been expended and will be in vain as WUs will timeout and be re-assigned.

Re: 140.163.0.0/16: blacklisting MSKCC via FW

Posted: Mon Apr 13, 2020 3:35 pm
by Neil-B
I am guessing, obviously correct me if I am wrong, that the science teams will not have in any way deliberately set up their WSs to send out WUs knowing they won't be able to get them back.

I am also guessing, again correct me if I am wrong, that the science teams may well be at least as frustrated as (if not far more so than) the folders that the current issues are losing them (or at least slowing down their receipt of) valuable science.

Finally I am also guessing, again feel free to correct (or even lambast me if it helps with getting frustration out of the system) if I am wrong, that technical issues with rapidly expanding infrastructure sometimes happen and can be both a real pain to resolve and can sometimes be found out once "things have been set in motion" and so the pain can be felt for a significant period after it is first noticed.

No one is happy with the current issues, but I firmly believe that the (incredibly small) FAH team (both technical and science) are honestly trying to improve the situation - yes, it is frustrating, but I am sure no-one within FAH means any disrespect to the folders.

It has been a stressful month or more for all, it is a holiday weekend for most, whilst FAH is important we all need to remember that the FAH team may also be contending with lockdowns, isolation, sick family/colleagues so perhaps giving everyone a breather and not overstressing might be a good way forward?

Stay Well, Stay Healthy, and look after yourselves and those close to you.