netblazer wrote:...The only real difference here is that you have nothing worth stealing...
Maybe not worth stealing, as the eventual results are to be given away for free. But a malicious person or group may exist and may have other motivations, such as to prevent research of this nature, for no other particular reason. People really are like that, all over the world.
Also, as a fictional example, what if a major drug company spent billions of dollars and decades of research and are poised to release a breakthrough, and could make trillions of dollars with a cure, but their competitors might make a similar drug based on the results of this research, with minimal investment in their own drug research. They might not be able to resist the temptation to hinder this research project. They could even outsource the problem to an unpopular country with a reputation for doing such things, and claim plausible deniability. Corporations really are like that, all over the world.
So someone could benefit not by stealing, but by destroying what's there, breaking confidence, slowing contributions and draining resources trying to assess and repair damages.
VijayPande wrote:netblazer wrote:
No offense but that's certainly one of the stupidest thing I've ever heard. You need to have a single machine catch all requests and reroute has needed (that's what a router does).
I think you're missing the point -- the goal here is to remove single points of failure. Your proposed solution creates a new single point of failure. Moreover, with our current plan, we also gain some more flexibility in the future with how we handle the two AS's. One alternative quick fix is to assign the DNS name assign-gpu.stanford.edu to multiple IP's and let DNS round robin it for us. However, I've seen issues with this, especially if DNS gets flaky or screwed up.
I tend to strongly side with netblazer on the aspect of designing a fault-tolerant system, with redundancy. It can be done economically and securely.
It's a poor argument to say that having a router is the only single point of failure.
Are all the servers in the same building? On the same campus? Will a natural disaster like earthquake, wildfire, mudslide, etc. wipe out all of the servers and data'?
How about network redundancy? If a construction worker bumps/breaks a conduit, will it take down the project? Or do network cables feed into the building/campus from multiple directions?
How about ISP redundancy? If someone at your ISP, or their upstream ISP, fiddles with a route's netmask, and suddenly 1 million people on the other side of that mask can't access your server, do you have multiple ISP's who also don't use the same upstream ISP?
How about DNS redundancy? Is your DNS company fault-tolerant? Does your DNS company maintain servers around the world? I know of one Dynamic company that provides industry-leading DNS and related services fairly cheaply. The name of the company is hidden in the previous sentence. I am not a stakeholder, have nothing to gain by name dropping, just one company I am aware of that specifically strives to address this problem. Others may exist.
How about electrical redundancy? Does each server have a dual hot swap power supply? Is there a high capacity battery capable of running all the servers for 1 day, with no failover time delay? How about for 1 hour or even 1 minute, while a diesel generator comes online (those can have several second delays). How about electrical grids? Are you supplied electricity by a single grid? Or do multiple grids overlap your area?
How about data redundancy? Are you using RAID1 at a minimum? RAID5 or RAID6 might be better. For instance, RAID5 on RAID5 with hot spare and hot swap drives, I have read, is one way to ensure graceful degradation with 0 downtime, just replace drives and rebuild. How about keeping archived backups? Are they all stored on-site? All in same off-site location?
Routers are just one of several "single points of failure", just like a network cable, power cable, or data storage device. But these technologies can all be utilized in ways which increase redundancy, reduce single points of failure, degrade gracefully, and give extra hours or days (hopefully not that long!) in which to respond and fully repair a problem. Although some more esoteric solutions, especially involving hard drives, can be more costly, it may be possible to solicit donated hardware? If the project was set up to be run by a registered non-profit charitable organization, other companies could even write it off as a tax credit.
I am by no means a networking expert, yet I have been exposed to these problems (as primary tech support for residential and business internet service and data center operations) and solutions when I worked at a small ISP with less than 10 employees. So I know a bit about small staff and budgets too. All of that was over a decade ago, and technology has advanced so much in that time. And yet, sound fundamental design never ages.
Anyways, even if no hardware infrastructure policies or changes are ever going to be made (and I sense this is the stubborn predisposition), there has to be a better way to address the problem than hard coding lists of addresses into clients. Using a secure and authenticated means of transmission, you should have a means of retrieving a list from your servers, to logically decouple the dependency between existence of new backup servers, and the client's knowledge of them.
Pushing a new client out every time a new server comes online, is idiotic. Having 12-hours/week downtime hurts credibility, almost as much as corrupt data, and is equally unacceptable. Tap into the great corporate and political wealth and power base that calls Stanford their Alma Marta and get a few people to crack open their piggy banks or lend some brain cells and eyeballs to help come up with more robust yet equally secure solutions.
Anyways, I'll go back to silently crunching data now. Thanks and good luck with the project. Rank 3689 of 1718675 users, having completed 4130 work units. Respectable, considering I only have 1xAMD FX-8150 8-core CPU and 2xnVidia/EVGA 480GTX cards (a.k.a. the space heaters), running for just under 2 years now (22 months, with windows opened in winter even at negative Fahrenheit temperatures, if only 1/8", just enough to keep cards below 80C). If only the GPUs/Folding cores had a process priority system, I'd be able to let the GPUs run while watching videos or playing games or doing heavy surfing...