Page 1 of 1
Workserver Requirement.
Posted: Sun May 17, 2020 12:56 pm
by ChrisD5710
By watching Linus build a Work Server, I learned that 1 GigaBit Internet connection and at least 60 Terabytes of Fault tolerant storage is required.
In another thread I asked if 60 TB was really needed as this amount of HDD space is very expensive. I thought a server with, say 30 TB would be good enough to help with the load.
One reply, that I can not locate anymore, said: This is out of Your League as a plain Folder. You need a lot of Money, at least at a corporate level, so just forget it.
OK. I forget it, but I started thinking: Even if I can not afford this, I would like to know more about the subject, but I am not able to find any info. Why is that? Is it because this is for the elite only, or do I just look in the wrong places?
About the 60 TB required, I did some thinking, and this makes me wonder. You set up the WS and FAH uploads Work for Your server to split up in WUs that are then sent to folders and the results coming back is stored in this 60 TB storage, compressed, I presume.
Now when the storage is full, data must be pulled back to FAH in order to continue. OK 1 GBit/S Internet speed will transfer 90+ MegaBytes/Sec. That gives roughly 3 Hours to transfer 1 TB or almost a solid week to transfer the lot, during which time server can not serve Folders...
What have I missed here? Please enlighten me a bit.
ChrisD
Re: Workserver Requirement.
Posted: Sun May 17, 2020 1:17 pm
by uyaem
One thing is that the WU server will generate follow-up WUs based on returned data, so it is not simply a matter of "loading it up once, waiting for results, then sending that off".
Imagine running a machine that becomes a bottleneck. When people's uploads fail, and points dwindle, you've got a lot of grumpy users very quickly, and new users who don't delve into the details may not even come back.
It's certainly got nothing to do with "for the elites only".
Re: Workserver Requirement.
Posted: Sun May 17, 2020 1:24 pm
by TPL
I really don't comment about his answer but if you want to make enything fault tolerant you need multiplying, backup systems and not just one. It would take a lot of bandwidth to use e.g. 2 mirrors.
Also you can't just sit down waiting for error somewhere but need to renew the systems during time period which is well below their known reliable lifetime. And that means $$$$$.
Re: Workserver Requirement.
Posted: Sun May 17, 2020 1:29 pm
by Neil-B
I have seen information with regard to workflow and what happens to data fairly recently … probably during the devs fireside chat or on GitHub somewhere … might even be buried in the forums somewhere - I'll see if I can track down some links and update here.
Covered at a high ish level in the chat
https://stanford.zoom.us/rec/play/7pV-d ... 6462356000 from 27:48 onwards
As to why much isn't specifically posted - again I don't think it is about "elitism" but just that when people have this type of resource/capability the discussions about precisely how they are setup and configured would probably not be via public forum but direct between the donor and the FAH Consortium/Devs.
I can actually do the storage part tbh … I just don't have the resilient level of connectivity so can't offer to help … and I sure as eggs couldn't provide the resilient levels of support/service that would be needed ... imho it isn't about Elitism rather about being able to deliver a corporate grade capability with high throughput/low downtime/high resilience and whilst we have see some of these offers pass through the forums with the offers being forwarded to the FAH Consortium looking at the pool of servers and their naming conventions most are provided by large multinationals/corporates/academic institutions.
Re: Workserver Requirement.
Posted: Sun May 17, 2020 2:16 pm
by foldy
On the other hand 60 TB of HDD is not such expensive. You get a 16TB HDD for $400 like Seagate Exos 16TB Enterprise. So you need four of them to get 60 TB HDD = $1600. Plus the storage rack and a SSD for caching = $600. Then you want to have some hot spare drives for the RAID if one drive fails so for 2 additional 16TB HDD you need another 2x $400 = $800. Total = $3000 Then you want to live backup everything so you need a backup system which is the same system again, so additional $3000.
Re: Workserver Requirement.
Posted: Sun May 17, 2020 2:45 pm
by Neil-B
You might want to add 256/512GB ram to that equation … and tbh ssd or flash array storage fitout would probably be better than hdd with ssd cache … personally I'd be surprised if an enterprise grade solution server with resilient storage/resilient power/cooling/dual fibre/fast lrdimms/hot swap everything (cooling/storage/power) came in much under £25k (but could cost £100k+ without too much effort) … and it would help to be hosted in a 24/7/365 maintained site with backup power/comms/cooling.
But actually even this cost is relatively insignificant (in corporate terms) - much harder is actually the commitment to continue to deliver 24/7/365 a level of uptime/service as needed with a full enterprise level support team looking after it … I would expect the The FAH Consortium will as far as possible want to have the core infrastructure built/supported/maintained to enterprise standards/levels/security postures - there is a lot at stake both to the science but also to the folding community and imo (and I have no association with the FAH Consortium) relying on even an expertly skilled hobbyist maintained bit of kit for core infrastructure would be significantly risky.
From what I can tell there are no shortage of ISP/Hosters who can/are supplying this - so I don't see the need to commit to deliver something that can be delivered better by others … As a home folder having my (if I had one) WS go down because I am on a domestic fibre access which (albeit very stable with few interruptions) can as domestic supplies do go down at any time with little/no recourse to get it solved quickly (and yes I could put a generator in which I haven't currently got) … I don't hold spares of all my kit so the WS might be out of action until such could be sourced/ordered (and yes I could buy these and keep them in stock) … There is only one of me and if I get knocked down by a bus then all the science on the WS might be wasted (I could hire some people to deliver/manage the WS on my behalf which I don't currently do) … but when there is no shortage of offers from very large, very capable enterprises I cannot see what benefit I would offer FAH by doing all this.
Re: Workserver Requirement.
Posted: Mon May 18, 2020 2:06 pm
by ChrisD5710
Thanks for Your replies.
Seems You all have very high expectations to the work servers used.
I do not think it is that simple, at Linus I see only ONE Server. Sure, redundant power supply, ample hdds with hot swap, but still only one server. 24/7/365 up time is not possible. Even if every one is doing their best, things still go wrong.
Just look at the server stats.
Even with these odds, it is amazing how much work can be done
No one has told me how work results are pulled from the work servers. How often and how much data. Even if I can not afford a server, I would still like to know.
ChrisD
Re: Workserver Requirement.
Posted: Mon May 18, 2020 3:02 pm
by Neil-B
At 29:41 of the chat there is a very brief mention that the researchers do some analysis on the WS prior to offloading which reduces the volume of data and it would be interesting to know what level of data volume reduction they achieve prior to offloading the data (assuming they actually do this).
Re: Workserver Requirement.
Posted: Mon May 18, 2020 3:43 pm
by MeeLee
You don't need to be a corporation to be able to run a server.
If you are running a personal server, you probably can get by with running a Ryzen 9 3000 series CPU, some regular RAM (some boards also support ECC RAM, but there've been issues reported trying to run them on some).
Most motherboards have about 6 to 10 Sata drive connectors on them, plus have 1 to 3x M.2 drives as well; which should be plenty for storage needs.
Would probably cost you $750 total (+ case and drives). And use your personal internet.
A step up, and you can find some older 2nd gen Epyc rack servers.
Companies get rid of them, and Amazon and Ebay have an abundance of them available.
If you want to run a FAH server or any decent company server, you probably need a system that has a redundancy system built in.
Like 2 CPU motherboard, drives in RAID, ECC memory, dual PSUs.
A modern 3rd gen Epyc or new Intel system would set you back several (ten) thousands of dollars.
But even here, you can scout the inets for some good deals!
A lot of companies buy newer stuff, and older hardware can be bought online very cheap.
Like for instance, you might find several older dual-Xeon systems for under $1000.
They might still be good enough to serve a decent amount of WUs.
While you don't need Gigabit internet (100Mbits or 250MBits can still do a lot), you do need some sort of company DSL or Cable line at your home.
No dish network that depends on the weather, and no home internet that starts throttling you after you hit 10GB per day.
Those things don't come cheap, but anywhere from 200 (or up) a month will get you a decent internet connection.
It's not impossible to pay, but it still costs a lot. And most people can't afford it (especially not during COVID).
You might need to be a company to do the main bulk of the internet needs for FAH, but if you're a co-server, backup server, or support server, you don't need to have the highest tier connection.
The last thing you'll probably need, is credibility.
I don't think FAH would just randomly allow you to start a server on their behalf.
Credible sources, and background is essential.
if tomorrow a representative of Cisco would ask to do this, they would be given access much quicker, than the average end user like you or me.
Re: Workserver Requirement.
Posted: Mon May 18, 2020 6:57 pm
by PantherX
ChrisD5710 wrote:...No one has told me how work results are pulled from the work servers. How often and how much data...
For Servers that are in the lab, it can be pulled locally to a different system for further analysis. The datasets that are pulled from the servers once the Project is over tends to be several TBs of data. How long a Project runs for, it can vary from weeks to years... just depends on the researcher when they have sufficient amount of statistical data that is viable and useful. Currently, due to the high volume of Donors, data generation that would generally take weeks are being done in days so that is shaving off months of research time for researchers
Re: Workserver Requirement.
Posted: Tue May 19, 2020 5:27 pm
by ChrisD5710
@Niel-B
Been listhening to the chat, and that was informative. Thanks for the Link. Is there a way I can find more of this myself?
One thing I noticed is the problem that a lot of folders are on ADSL which means that big results takes forever to get uploaded back to the servers.
How about introducing a variable where folder´s upload speed is tested and stored. This way small wu´s with less data could be handed to folders with limited uplad capabilities, easing the burdon of the slow upload of WS´s. This is already tested in BOINC, so this should not be impossibe to implement.
ChrisD
Re: Workserver Requirement.
Posted: Tue May 19, 2020 5:44 pm
by Neil-B
The FAH Github
https://github.com/FoldingAtHome is a good place to keep track of what is being requested as enhancements and to make suggestions for such (fah-Issues) .. It might be worth noting that the upload time wouldn't usually be too much of an issue but on overloaded servers it bottlenecks throughput but jamming up the connections ... once the WSs and throughput finally balance out it may be be less of an issue?
Re: Workserver Requirement.
Posted: Tue May 19, 2020 6:37 pm
by bruce
There's a little-used setting in FAH which was originally designed for slow modems that you can try. I don't promise it will change anything since I don't think the project managers pay attention to it, but you can judge for yourself.
Set MAX-PACKE-SIZE=SMALL.