Licensing and source code availability considerations
Posted: Wed Feb 03, 2010 11:53 pm
I'm a bit of a gammer and I recently purchased this nice nVidia 260 with 216 GPU cores. After registering my product, I learned from their web page about this project and thought, "hey, what better way to heat my home?!" Unfortunately, I run Linux exclusively and I quickly discovered that Linux client is incapable of exploiting the GPU. As a consultant / software architect /developer, I immediately sought the sources to see what it would take to add this support and was surprised to discover that the client is closed source. This posting is a plea for the development & management team to consider an alternative to this closed-source approach.
The benefits of an open source application such as this, especially one with such a noble purpose, are too great to number. On the top of the list, I will mention superb peer-review and the contributions of some of the brightest minds in the world (and I dare place myself somewhere in that vicinity, but certainly not near the top!). Were I the manager of such a project, I would have two primary concerns (in order of importance):
I'll address this second issue 1st. It is my opinion that absolutely every server application (broadly, any application that will ever listen on a port) of any importance should be written with the deterrence of DoS (Denial of Service) attacks as their first priority. Some DoS attacks indeed occur by accident or because of programming errors in a client application. Any concerns for server stability with clients built from modified code can be addressed in the same way you would make it DoS resistant:
So I've only recently found out about this project, but it would appear that data integrity and correctness of calculations would be the major barrier to opening up the source code to allow the community to participate in development, debugging, submit patches, etc. Unfortunately, I do not have a fully open source solution to offer, but I do have a mostly open source solution.
First off, understand that if any descent hacker wanted to submit invalid results, it wouldn't be terribly difficult, unless you are encrypting the data. But even that would only add to the difficulty, it wouldn't make it impossible. If an application is running on a foreign machine, it can be debugged, memory examined & altered, assembly code reverse engineered, etc. Given a particular network session, such a malicious hacker could intercept the public key from the network stream and/or application memory, reverse engineer the code that performs the encryption and replicate that process, feeding invalid data (or simply altering the data in the process prior to it's encryption). Security through obscurity is only a deterrent, it is can never be completely effective. Fortunately, these malicious types tend to prey on large corporations who have earned public angst, and not innocuous university programs who are seeking to benefit human health.
So my point here is that, as best as I can tell, the client application is already vulnerable to this type of garbage, although it may not be easy. However, I would be surprised (as well as saddened and disappointed) if anybody cared enough to go through all of the trouble required to cause this type of problem. So what I propose is a solution that is as difficult, or nearly so, to compromise.
The benefits of an open source application such as this, especially one with such a noble purpose, are too great to number. On the top of the list, I will mention superb peer-review and the contributions of some of the brightest minds in the world (and I dare place myself somewhere in that vicinity, but certainly not near the top!). Were I the manager of such a project, I would have two primary concerns (in order of importance):
- Reliability (Correctness) of Data: the integrity of data provided by a client who's code may have been altered in such a manner that introduced logical errors
- Server Stability: a negative affect on the stability and/or availability of the server
I'll address this second issue 1st. It is my opinion that absolutely every server application (broadly, any application that will ever listen on a port) of any importance should be written with the deterrence of DoS (Denial of Service) attacks as their first priority. Some DoS attacks indeed occur by accident or because of programming errors in a client application. Any concerns for server stability with clients built from modified code can be addressed in the same way you would make it DoS resistant:
- Perform a thorough audit of the server code's execution paths. Make sure that every possible anomalous condition is checked for. Make no assumptions about the integrity of anything that comes from the network! Obviously, this is especially the case with copying data into pre-allocated buffers prior to checking the size of the data (but that one is a no-brainer).
- Add an auditing framework into the code to track connections by IP address. This part is a bit more work. You will want to construct a small database in memory and track general stats for each IP address. This table would use the IP address (or IPv6 if you've advanced that far ) as the primary key and contain first tracked connection time, last connection time, and a count of each transaction result type, i.e., successful uploads, successful downloads, and then each of the various error conditions. Some error conditions will occur fairly normally, like reset connections (usually the app crashing) or dropped (timed-out) connections (the computer crashing or connection going down). Other connection anomalies will be red flags of very obvious DoS attempts or just bad code. These addresses can then be banned (temporarily or permanently) either at the application level or by sending them to iptables (if you're on Linux), which will be more efficient. The Linux app fail2ban is very nice for this type of thing and would only require that you output your security failures to some log file and perform some minimal configuration to implement. You will also want to ban anybody with an excessive number of connections that resulted in some failure.
So I've only recently found out about this project, but it would appear that data integrity and correctness of calculations would be the major barrier to opening up the source code to allow the community to participate in development, debugging, submit patches, etc. Unfortunately, I do not have a fully open source solution to offer, but I do have a mostly open source solution.
First off, understand that if any descent hacker wanted to submit invalid results, it wouldn't be terribly difficult, unless you are encrypting the data. But even that would only add to the difficulty, it wouldn't make it impossible. If an application is running on a foreign machine, it can be debugged, memory examined & altered, assembly code reverse engineered, etc. Given a particular network session, such a malicious hacker could intercept the public key from the network stream and/or application memory, reverse engineer the code that performs the encryption and replicate that process, feeding invalid data (or simply altering the data in the process prior to it's encryption). Security through obscurity is only a deterrent, it is can never be completely effective. Fortunately, these malicious types tend to prey on large corporations who have earned public angst, and not innocuous university programs who are seeking to benefit human health.
So my point here is that, as best as I can tell, the client application is already vulnerable to this type of garbage, although it may not be easy. However, I would be surprised (as well as saddened and disappointed) if anybody cared enough to go through all of the trouble required to cause this type of problem. So what I propose is a solution that is as difficult, or nearly so, to compromise.
- There should be a development/test server. Data uploaded to this server should not be used for any science! It should exist to support community development. Additionally, this server should give work that has already been performed in production and the development server would serve as a validation means to catch changes that have broken calculations. This is the only server you can communicate with if you have compiled your own code.
- All communications to the production server should be encrypted. You can only use this server if you have downloaded an official (pre-compiled) binary.
- An abstraction should be added for server communications, allowing a selection of dynamically linked modules. A separate development & production server communications module should be provided. Another benefit of dynamic linking is that it allows you to use otherwise incompatible licenses in the same process space (e.g., GPL & proprietary EULA).
- The production server communications module would have to remain closed source, not to make compromising it impossible -- (I hope I've already demonstrated that such is an unattainable goal), but to deter such compromises and limit the number of people who would be capable of it (i.e., only the highly skilled & dedicated hackers -- of which, I hope there are none with the desire to attack a program with such a noble cause). There are a wide variety of ugly "security through obscurity" mechanisms that can be further used to make reverse engineering this module more difficult, which I wont get into here. Of importance, the server communications module should scrutinize the process space in which it's running, including:
- Carefully examine all dynamically linked modules
- Attempt to determine if the process is being debugged or not
- Perform various hash calculations on the primary executable image as well as every other DLL/shared object in the process space
- Link statically to a libc that uses heap randomization (really, this is only a small help though)
- Finally, a development communications module should be provided for talking to the development server.