Fireside Chat Suggestions
Moderators: Site Moderators, FAHC Science Team
Fireside Chat Suggestions
Long time lurker, first time poster here! I just watched the fireside dev chat and I understand that the suggestion for a distributed file system was suggested to combat the lack of space for the completed WUs but what if the work server software was distributed as well such that people all over the world could run their own work servers that would collect the work units and then submit all of them at a certain time to the central servers back at FAH HQ? Just trying to start the conversation here, I'd like to hear any cons with this idea
-
- Posts: 2522
- Joined: Mon Feb 16, 2009 4:12 am
- Location: Greenwood MS USA
Re: Fireside Chat Suggestions
[I have no association with Folding@Home other than as a volunteer donor, these ideas are just my own based on applications I have written in the past]
Currently (as I understand it) the client has two assignment servers hard coded into it. This makes it difficult to 'hijack' clients to point to fake assignment servers, as there is no way to update the IP Addresses and no DNS lookup to spoof.
Then the Assignment servers have the Work servers IP Addresses hard coded into them, so again it is hard to hijack clients to give them bogus work. As you can imagine, this is human intensive and is one reason adding servers to cope with a 10 to 1 growth in the last month happens as slowly as it has. (much faster than I expected, as companies donated servers I thought grants would have to be written to fund) It is however very secure, and you want it to be secure. Discovering you have been 'folding' for some kid in his basement is going to leave a very bad taste.
So what you need to resolve to make your idea secure, is a way to add and drop servers on the fly that is not human intensive and yet, maintains the security of the volunteers donations. If a server is dropped, where are it's points accumulated? If a server is added, how are Work Units added to it to be deployed? Making these worries both foolproof and dynamic are going to be the heart of your challenges.
Another challenge, and this is easier, but related, is building a firewall that both protects your network and allows F@H to communicate. In the past we have been blessed with a new server only to discover that it was not correctly recording points or bonuses to the Stats server. Months later batch processes awarded these points. (I do not know that it is happening now, but I see topics that make me fear at least some new servers are not doing stats perfectly. It may be that new users are just confused by how stats are awarded, but I would not bet on that.) If everyone in the world used the same router, then copying working configurations might be easy, but that is not the case.
Anyway, just some ideas that need to be considered to make the idea work securely.
Currently (as I understand it) the client has two assignment servers hard coded into it. This makes it difficult to 'hijack' clients to point to fake assignment servers, as there is no way to update the IP Addresses and no DNS lookup to spoof.
Then the Assignment servers have the Work servers IP Addresses hard coded into them, so again it is hard to hijack clients to give them bogus work. As you can imagine, this is human intensive and is one reason adding servers to cope with a 10 to 1 growth in the last month happens as slowly as it has. (much faster than I expected, as companies donated servers I thought grants would have to be written to fund) It is however very secure, and you want it to be secure. Discovering you have been 'folding' for some kid in his basement is going to leave a very bad taste.
So what you need to resolve to make your idea secure, is a way to add and drop servers on the fly that is not human intensive and yet, maintains the security of the volunteers donations. If a server is dropped, where are it's points accumulated? If a server is added, how are Work Units added to it to be deployed? Making these worries both foolproof and dynamic are going to be the heart of your challenges.
Another challenge, and this is easier, but related, is building a firewall that both protects your network and allows F@H to communicate. In the past we have been blessed with a new server only to discover that it was not correctly recording points or bonuses to the Stats server. Months later batch processes awarded these points. (I do not know that it is happening now, but I see topics that make me fear at least some new servers are not doing stats perfectly. It may be that new users are just confused by how stats are awarded, but I would not bet on that.) If everyone in the world used the same router, then copying working configurations might be easy, but that is not the case.
Anyway, just some ideas that need to be considered to make the idea work securely.
Tsar of all the Rushers
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
I tried to remain childlike, all I achieved was childish.
A friend to those who want no friends
Re: Fireside Chat Suggestions
Those are some very valid points. Maybe the work servers could be distributed but the FAH team would still maintain the stats server and assign server and those server would have their IP address hardcoded into the distributed work server code. The distributed work servers could work in a way like the current work unit distribution for example a central huge server at FAH HQ would distribute a chunk of work units to the different work servers worldwide and from there, the work servers could distribute them to clients near their locale. As for servers dropping out, maybe telemetry for the work unit could be sent directly to the hardcoded stats server at the same time the work unit is sent to the work server.
-
- Site Moderator
- Posts: 6986
- Joined: Wed Dec 23, 2009 9:33 am
- Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB
Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400 - Location: Land Of The Long White Cloud
- Contact:
Re: Fireside Chat Suggestions
Welcome to the F@H Forum Cryptoxic,
Work Servers are pretty powerful since they generate WUs (that's a resource intensive process), collect completed WUs, verify completed WUs and then does the crediting calculations too. Keep in mind that researches upload/download TBs of data from those Work Servers for analysis and further investigation. The slowest aspect for the Work Server is actually receiving data from the Donors. Most Donor's upload speed is quite low which is the biggest bottleneck. If all the donors have fast upload (400 Mb/s to 500 Mb/s) then it would be a very different picture.
For Donors to host a Work Server, the infrastructure at home isn't powerful enough for that
Work Servers are pretty powerful since they generate WUs (that's a resource intensive process), collect completed WUs, verify completed WUs and then does the crediting calculations too. Keep in mind that researches upload/download TBs of data from those Work Servers for analysis and further investigation. The slowest aspect for the Work Server is actually receiving data from the Donors. Most Donor's upload speed is quite low which is the biggest bottleneck. If all the donors have fast upload (400 Mb/s to 500 Mb/s) then it would be a very different picture.
For Donors to host a Work Server, the infrastructure at home isn't powerful enough for that
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Re: Fireside Chat Suggestions
Hi Panther, thanks for the welcome!
Yes I totally understand that, some people might have very little storage space or low speed but there are some with huge storage servers and high speed as well, which is why giving the option for people like us an option to host a work server would be awesome. For me personally, I have a 1Gbps up and down internet running in my house but my servers all have limited space. I would happily run my server as a work server to serve my locale if it was distributed though.
From what I understand (I watched linus' video about hosting a work server) the server was remoted into by someone at FAH and the work server software was setup by them which, no offense, I'm not really comfortable doing. Also, there's a space requirement that is in the TiB iirc.
Yes I totally understand that, some people might have very little storage space or low speed but there are some with huge storage servers and high speed as well, which is why giving the option for people like us an option to host a work server would be awesome. For me personally, I have a 1Gbps up and down internet running in my house but my servers all have limited space. I would happily run my server as a work server to serve my locale if it was distributed though.
From what I understand (I watched linus' video about hosting a work server) the server was remoted into by someone at FAH and the work server software was setup by them which, no offense, I'm not really comfortable doing. Also, there's a space requirement that is in the TiB iirc.
-
- Posts: 410
- Joined: Mon Nov 15, 2010 8:51 pm
- Hardware configuration: 8x GTX 1080
3x GTX 1080 Ti
3x GTX 1060
Various other bits and pieces - Location: South Coast, UK
Re: Fireside Chat Suggestions
How about reducing the need for work servers at all - They will need to spin up all the projects / runs / clones but how much is done by the server when the next gen is created?
Could that be delegated to the client? We've all seen cases where we upload a WU and 2 seconds later get assigned the next gen, so it seems like it should be possible.
This could eventually morph into a core_23 or whatever the 'streaming' core was going to be called - 'infinite' length WU threads that 'checkpoint' with the server at intervals or when the client is paused etc. Massively reducing the server-client communications and making any that does happen less time critical so more processing actually gets done at the client end.
It seems ridiculous that you wait hours to receive a WU that takes 30 minutes to complete, then an hour to successfully upload, when instead you could just crunch through the gens sequentially. You could do 100 'WUs' within the timeout for 1 at the moment so I don't see what's to be lost with such an approach.
Could that be delegated to the client? We've all seen cases where we upload a WU and 2 seconds later get assigned the next gen, so it seems like it should be possible.
This could eventually morph into a core_23 or whatever the 'streaming' core was going to be called - 'infinite' length WU threads that 'checkpoint' with the server at intervals or when the client is paused etc. Massively reducing the server-client communications and making any that does happen less time critical so more processing actually gets done at the client end.
It seems ridiculous that you wait hours to receive a WU that takes 30 minutes to complete, then an hour to successfully upload, when instead you could just crunch through the gens sequentially. You could do 100 'WUs' within the timeout for 1 at the moment so I don't see what's to be lost with such an approach.
-
- Site Moderator
- Posts: 6986
- Joined: Wed Dec 23, 2009 9:33 am
- Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB
Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400 - Location: Land Of The Long White Cloud
- Contact:
Re: Fireside Chat Suggestions
I do like the concept of a streaming WUs... after all, internet technologies have progressed quite a lot since V7 was first released. I would wait and see what the new F@H Infrastructure is and then wait for their plans... after all, if they get really awesome long-term partnerships, the sky's the limit
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
-
- Site Moderator
- Posts: 6349
- Joined: Sun Dec 02, 2007 10:38 am
- Location: Bordeaux, France
- Contact:
Re: Fireside Chat Suggestions
I still have nightmares when I think about streaming core experiments ...