Page 4 of 12

Re: Langouste -- WU upload/download de-coupler [Linux only]

Posted: Thu Jan 14, 2010 1:17 pm
by tear
Well, you need to draw a line somewhere. As langouste copies *all* contents of client directory
it would normally recurse indefinitely if destination lied within the source (as it did in your
case). There are, however, safety checks in "cp" that prevent that from happening and stop
the copy process as soon as it knows recursion is happening.

Demo:

Code: Select all

mkdir test
cd test
mkdir tmp
cd ..
cp -R test test/tmp
ls -l test/tmp/test # you might want to use "mc" here for better view
So yeah, general conclusion (as you said) is: TMPDIR variable in langouste-helper.sh must _not_ point to client directory or any of its subdirectories


tear

Re: Langouste -- WU upload/download de-coupler [Linux only]

Posted: Fri Jan 15, 2010 4:42 pm
by mattifolder
tear wrote:Hi Matti, Welcome to the forum,

Re first item
I'm not sure it's possible. I encourage you to experiment a little bit and yes, helper script
is most likely the best spot.

The risk comes from the fact that original client (normally) runs at all times and appears
to keep FAHlog.txt file descriptor open. What it means is, yeah, sure, you can append
data to the file but as offset tied to original client's fd remains unchanged, subsequent
updates (made by original client) will overwrite any appended data.

I don't have evidence this *will* happen -- call it a word of caution.


Re second item
It has been suggested before -- now I recall another bit that requires verification (I'll check
it with next fresh WU). I *think* client reads contents of client.cfg in at startup and
keeps them in memory, altering as necessary . What I think it also does is: it writes out
those contents to client.cfg whenever you stop the client (thus, again, discarding any
manually made changes*).

*) unless of course, it gets killed by SIGKILL (-9) ... ugh


tear
Thank's for your answer. I read also the following notes. Meanwhile I checked the suggested enhancements manually.
I've seen the problems of open file descriptor from fah-client to Fahlog.txt with "mixed" data. It's not a problem to fah,
but i.e. to fahmon. Changing the client.cfg doesn't seem problematically.

I think, in my use case the better way is to put an post processing to the shutdown-script for folding client behind the shutdown of fah and its cores.

Re: Langouste -- WU upload/download de-coupler [Linux only]

Posted: Fri Jan 15, 2010 6:11 pm
by tear
Ah, yes. Good catch with monitoring tools.

In the mean time I tested client.cfg logic and...
successful autosend results in client rewriting the file with
memory-kept contents (with incremented local=).

What it means is that langouste would need to
kick in every single time (even when it doesn't
have to == when client is already folding) to avoid
client-made modification of client.cfg... I don't really want
to do that (and I do remember I had a good reason
to activate langouste only when necessary).

However! Why not reuse your idea of lazy (log) update
and apply it to client.cfg as well? I don't have anything
specific on my mind but maybe it's worth exploring?


tear

Re: Langouste -- WU upload/download de-coupler [Linux only]

Posted: Sat Jan 16, 2010 2:26 pm
by whynot
After a bit of investigating

Code: Select all

strace -efile,desc,fork -o ~/foo.strace -ff ./fah6 -verbosity 9 
(if single '-f' is used then insane things happen; don't) I've found that:

Code: Select all

% grep -n '^open' foo.strace.22* | grep FAHlog.txt
foo.strace.22170:29:open("FAHlog.txt", O_RDWR|O_CREAT|O_APPEND, 0666) = 3
and that:

Code: Select all

% grep -n 'write(3' foo.strace.22*
foo.strace.22170:33:write(3, "\n\n--- Opening Log file [January 1"..., 47) = 47
foo.strace.22170:35:write(3, "\n# Linux Console Edition ########"..., 81) = 81
foo.strace.22170:37:write(3, "#################################"..., 80) = 80
foo.strace.22170:39:write(3, "\n"..., 1)                    = 1
foo.strace.22170:41:write(3, "                       Folding@Ho"..., 56) = 56
foo.strace.22170:43:write(3, "\n"..., 1)                    = 1
foo.strace.22170:45:write(3, "                          http://"..., 54) = 54
foo.strace.22170:47:write(3, "\n"..., 1)                    = 1
foo.strace.22170:49:write(3, "#################################"..., 80) = 80
foo.strace.22170:51:write(3, "#################################"..., 80) = 80
foo.strace.22170:53:write(3, "\n"..., 1)                    = 1
foo.strace.22170:56:write(3, "Launch directory: /home/whynot/fo"..., 38) = 38
foo.strace.22170:58:write(3, "Executable: ./fah6\n"..., 19) = 19
foo.strace.22170:60:write(3, "Arguments: "..., 11)          = 11
foo.strace.22170:62:write(3, "-verbosity "..., 11)          = 11
foo.strace.22170:64:write(3, "9 "..., 2)                    = 2
foo.strace.22170:66:write(3, "\n\n"..., 2)                  = 2
foo.strace.22170:77:write(3, "[13:07:46] - Ask before connectin"..., 39) = 39
foo.strace.22170:79:write(3, "[13:07:46] - User name: whynot_00"..., 33) = 33
foo.strace.22170:81:write(3, " (Team 2164)\n"..., 13)       = 13
foo.strace.22170:87:write(3, "[13:07:46] - User ID: 4C71B0411EC"..., 39) = 39
foo.strace.22170:89:write(3, "[13:07:46] - Machine ID: 1\n"..., 27) = 27
foo.strace.22170:91:write(3, "[13:07:46] \n"..., 12)        = 12
foo.strace.22170:110:write(3, "[13:07:46] Loaded queue successfu"..., 38) = 38
foo.strace.22171:2:write(3, "[13:07:46] \n"..., 12)        = 12
foo.strace.22171:4:write(3, "[13:07:46] + Processing work unit"..., 34) = 34
foo.strace.22171:6:write(3, "[13:07:46] Core required: FahCore"..., 41) = 41
foo.strace.22171:9:write(3, "[13:07:46] Core found.\n"..., 23) = 23
foo.strace.22171:26:write(3, "[13:07:46] Working on Unit 00 [Ja"..., 52) = 52
foo.strace.22171:28:write(3, "[13:07:46] + Working ...\n"..., 25) = 25
foo.strace.22171:30:write(3, "[13:07:46] - Calling './FahCore_7"..., 116) = 116
foo.strace.22172:2:write(3, "[13:07:46] - Autosending finished"..., 43) = 43
foo.strace.22172:4:write(3, "[13:07:46] Trying to send all fin"..., 50) = 50
foo.strace.22172:6:write(3, "[13:07:46] + No unsent completed "..., 50) = 50
foo.strace.22172:8:write(3, "[13:07:46] - Autosend completed\n"..., 32) = 32
foo.strace.22173:6:write(3, "[13:07:47] \n"..., 12)        = 12
foo.strace.22173:8:write(3, "[13:07:47] *---------------------"..., 44) = 44
foo.strace.22173:10:write(3, "[13:07:47] Folding@Home Gromacs C"..., 37) = 37
foo.strace.22173:12:write(3, "[13:07:47] Version 1.90 (March 8,"..., 40) = 40
foo.strace.22173:14:write(3, "[13:07:47] \n"..., 12)        = 12
foo.strace.22173:16:write(3, "[13:07:47] Preparing to commence "..., 44) = 44
foo.strace.22173:18:write(3, "[13:07:47] - Ensuring status. Ple"..., 43) = 43
foo.strace.22173:124:write(3, "[13:08:04] - Looking at optimizat"..., 37) = 37
foo.strace.22173:130:write(3, "...\n"..., 4)                 = 4
foo.strace.22173:132:write(3, "[13:08:04] - Working with standar"..., 60) = 60
foo.strace.22173:134:write(3, "[13:08:04] - Previous termination"..., 56) = 56
foo.strace.22173:136:write(3, "[13:08:04] - Files status OK\n"..., 29) = 29
foo.strace.22173:142:write(3, "[13:08:05] - Expanded 464114 -> 2"..., 69) = 69
foo.strace.22173:144:write(3, "[13:08:05] \n"..., 12)        = 12
foo.strace.22173:146:write(3, "[13:08:05] Project: 6315 (Run 582"..., 53) = 53
foo.strace.22173:148:write(3, "[13:08:05] \n"..., 12)        = 12
foo.strace.22173:150:write(3, "[13:08:05] Entering M.D.\n"..., 25) = 25
foo.strace.22173:273:write(3, "[13:08:25] (Starting from checkpo"..., 38) = 38
foo.strace.22173:279:write(3, "[13:08:25] Protein: p6315_sh3_wit"..., 45) = 45
foo.strace.22173:281:write(3, "[13:08:25] \n"..., 12)        = 12
foo.strace.22173:283:write(3, "[13:08:25] Writing local files\n"..., 31) = 31
foo.strace.22173:293:write(3, "[13:08:25] Completed 460000 out o"..., 55) = 55
Thus I should say that curious mind could just hardlink them.

Re: Langouste -- WU upload/download de-coupler [Linux only]

Posted: Wed Jun 30, 2010 7:45 pm
by p2501
I have a pretty general question here: since linux is bigadv starved atm I've switched my main rig from VM to windoze a3 bigadv. Would it be possible that this Windoze client could use langouste running on my (linux-)server? Or is it "just" able to listen to localhost connections?

Re: Langouste -- WU upload/download de-coupler [Linux only]

Posted: Thu Jul 01, 2010 1:41 pm
by tear
Hey p2501,

Langouste listens on localhost for
a) security purposes (otherwise it's an open HTTP proxy)
b) preventing users from using it with non-local machines ;-) (nothing personal -- just read on please)

To answer your question -- langouste can be made to listen on all addresses very easily though that,
however, unfortunately won't do you much good.

To do its work langouste (per design) needs to be run on local OS as it needs to be able to determine
whether a process attempting to return data is a regular client (needs to be interrupted) or a forked
client (must not be interrupted).

On Linux this is done by examining several files in Linux's /proc filesystem. First, langouste identifies
remote port number, then looks the PID up (that "owns" that port number), then it checks full command
line of that PID. If the command line contains "-send" -- it's a forked client. If not -- it's a regular client.
That's a top level description -- there are few other things langouste does for robustness.

To make Langouste work on Windows one would need to write the code that would
a) provide equivalent of find_pid (it uses find_sock and find_pid_by_sock)
b) provide equivalent of pid_issending
c) optionally (I _think_ langouste could do without it with acceptable sacrifice) provide equivalent
  of pid_numcores
d) optionally (ditto) provide equivalent of fah_machineid
e) optionally (ditto) provide equivalent of fah_pid_at_machine
f) make langouste work with Winsocks (can be mitigated by using cygwin though)

An alternative (to Windows porting) is finding a better way to identify the clients over remote connections...


HTH,
tear

Re: Langouste -- WU upload/download de-coupler [Linux only]

Posted: Sun Jul 11, 2010 12:15 am
by p2501
Hello tear,

thank you for your comprehensive reply. So it's very closely tied to GNU... well, it doesn't matter anymore, at least to me. I tried a Windoze bigadv run and got higher frame time than on VM Linux, it seems it's not for me. And if it wasn't for the GPUs I wouldn't even be running Windoze, although I must say that 7 is not so bad. :lol:

ty,
p2501

Re: Langouste -- WU upload/download de-coupler [Linux only]

Posted: Sun Jul 11, 2010 12:51 am
by tear
As a matter of fact I've started a Win32 port this morning (just to refresh my windows skillz); any alpha testers around? :mrgreen:

Re: Langouste -- WU upload/download de-coupler [Linux only]

Posted: Sun Jul 11, 2010 8:57 am
by DonMarkoni
tear wrote:As a matter of fact I've started a Win32 port this morning (just to refresh my windows skillz); any alpha testers around? :mrgreen:
Yes! :D Testing is my middle name. :eugeek:

I have one question (before I even started): can it be tested using regular WU, so I can test Langouste more frequently then every 69 hours?

p2501 wrote:... well, it doesn't matter anymore, at least to me. I tried a Windoze bigadv run and got higher frame time than on VM Linux, it seems it's not for me. ...
Looks like I wasn't the only one who got worse times with native -bigadv under Win, compared to VMware/Linux.

Re: Langouste -- WU upload/download de-coupler [Linux only]

Posted: Tue Jul 13, 2010 5:23 pm
by tear
Well then, please see viewtopic.php?f=14&t=15250 :-)

Re: Langouste -- WU upload/download de-coupler

Posted: Mon Jul 19, 2010 3:01 pm
by tear

From the ones who brought you Dual-Core performance fix, affinity-setting mpiexec, the original Langouste and many more...

... with special guest appearances of Punchy, DonMarkoni and metal03326...



          LANGOUSTE (for Windows)



(see first post for details)

Re: Langouste -- WU upload/download de-coupler

Posted: Wed Aug 04, 2010 10:23 pm
by tear
Update: Langouste now does upload capping; see first post for details.

Re: Langouste -- WU upload/download de-coupler (+upload capping)

Posted: Mon Aug 09, 2010 9:10 pm
by weedacres
I tried Langouste3-0.15.5 on an XP32 machine running an SMP client (6.29) and get errors when it's trying to upload. The smp client downloads the new project and takes off like it should. Here's a copy of the langouste screen:

Code: Select all

Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\Documents and Settings\Thinkpad>cd \langouste\dist\win32

C:\langouste\dist\win32>dir
 Volume in drive C is Preload
 Volume Serial Number is 6846-58AB

 Directory of C:\langouste\dist\win32

08/09/2010  12:01 PM    <DIR>          .
08/09/2010  12:01 PM    <DIR>          ..
08/03/2010  12:03 PM             1,862 langouste-helper.bat
08/03/2010  12:03 PM            81,920 langouste3-0.15.5.exe
08/09/2010  12:01 PM    <DIR>          reflogs
               2 File(s)         83,782 bytes
               3 Dir(s)  65,761,419,264 bytes free

C:\langouste\dist\win32>langouste3-0.15.5 -l 8880
Mon Aug 09 12:04:25 2010 Langouste3 0.15.5 (compiled Tue Aug  3 11:53:54 MDT 201
0 by kszysiu@tentacle)
Mon Aug 09 12:04:25 2010 Langouste3 comes with ABSOLUTELY NO WARRANTY; for detai
ls
Mon Aug 09 12:04:25 2010 see `COPYING.txt' file located in source directory
Mon Aug 09 12:04:25 2010 Default Langouste helper temp directory: C:\DOCUME~1\Th
inkpad\LOCALS~1\Temp\langouste-Thinkpad\
Mon Aug 09 12:04:25 2010 Listening on 127.0.0.1:8880
Mon Aug 09 13:45:43 2010 Accepted connection from: 127.0.0.1:2481
Mon Aug 09 13:45:43 2010 PID for socket: 8960
Mon Aug 09 13:45:43 2010 PID 8960: issending: 0
Mon Aug 09 13:45:43 2010 ===> PID 8960 is (most likely) contacting WU server, co
ntent length: 43626944
Mon Aug 09 13:45:43 2010 ===> Helper pid: -1
Mon Aug 09 13:45:43 2010 ===> PID 8960: numcores: 0
Mon Aug 09 13:45:43 2010 ===> now: 1281386743, last helper launched at: 0
Mon Aug 09 13:45:43 2010 ===> Launching helper: 'C:\FAH-Win\langouste-helper.bat
' (exe name: 'C:\FAH-Win\fah6.exe')...
Mon Aug 09 13:45:43 2010 ===> Forked 0x00000760
Mon Aug 09 13:45:43 2010 (0) Local: received 118 bytes, sent 0 bytes
Mon Aug 09 13:45:43 2010 Accepted connection from: 127.0.0.1:2482
Mon Aug 09 13:45:43 2010 PID for socket: 8960
Mon Aug 09 13:45:43 2010 PID 8960: issending: 0
Mon Aug 09 13:45:43 2010 ===> PID 8960 is (most likely) contacting WU server, co
ntent length: 43626944
Mon Aug 09 13:45:43 2010 ===> Helper pid: -1
Mon Aug 09 13:45:43 2010 ===> PID 8960: numcores: 0
Mon Aug 09 13:45:43 2010 ===> now: 1281386743, last helper launched at: 12813867
43
Mon Aug 09 13:45:43 2010 ===> WARNING: can only launch one helper in 2 minutes (
per client)
Mon Aug 09 13:45:43 2010 (0) Local: received 114 bytes, sent 0 bytes
Mon Aug 09 13:45:44 2010 Accepted connection from: 127.0.0.1:2484
Mon Aug 09 13:45:44 2010 PID for socket: 8960
Mon Aug 09 13:45:44 2010 PID 8960: issending: 0
Mon Aug 09 13:45:44 2010 ===> PID 8960 is (most likely) contacting WU server, co
ntent length: 43626944
Mon Aug 09 13:45:44 2010 ===> Helper pid: -1
Mon Aug 09 13:45:44 2010 ===> PID 8960: numcores: 0
Mon Aug 09 13:45:44 2010 ===> now: 1281386744, last helper launched at: 12813867
43
Mon Aug 09 13:45:44 2010 ===> WARNING: can only launch one helper in 2 minutes (
per client)
Mon Aug 09 13:45:44 2010 (0) Local: received 118 bytes, sent 0 bytes
Mon Aug 09 13:45:44 2010 Accepted connection from: 127.0.0.1:2485
Mon Aug 09 13:45:44 2010 PID for socket: 8960
Mon Aug 09 13:45:44 2010 PID 8960: issending: 0
Mon Aug 09 13:45:44 2010 ===> PID 8960 is (most likely) contacting WU server, co
ntent length: 43626944
Mon Aug 09 13:45:44 2010 ===> Helper pid: -1
Mon Aug 09 13:45:44 2010 ===> PID 8960: numcores: 0
Mon Aug 09 13:45:44 2010 ===> now: 1281386744, last helper launched at: 12813867
43
Mon Aug 09 13:45:44 2010 ===> WARNING: can only launch one helper in 2 minutes (
per client)
Mon Aug 09 13:45:44 2010 (0) Local: received 114 bytes, sent 0 bytes
Mon Aug 09 13:45:45 2010 Accepted connection from: 127.0.0.1:2486
Mon Aug 09 13:45:45 2010 PID for socket: 8960
Mon Aug 09 13:45:45 2010 PID 8960: issending: 0
Mon Aug 09 13:45:45 2010 ===> PID 8960 is (most likely) contacting WU server, co
ntent length: 43626944
Mon Aug 09 13:45:45 2010 ===> Helper pid: -1
Mon Aug 09 13:45:45 2010 ===> PID 8960: numcores: 0
Mon Aug 09 13:45:45 2010 ===> now: 1281386745, last helper launched at: 12813867
43
Mon Aug 09 13:45:45 2010 ===> WARNING: can only launch one helper in 2 minutes (
per client)
Mon Aug 09 13:45:45 2010 (0) Local: received 120 bytes, sent 0 bytes
Mon Aug 09 13:45:45 2010 Accepted connection from: 127.0.0.1:2487
Mon Aug 09 13:45:45 2010 PID for socket: 8960
Mon Aug 09 13:45:45 2010 PID 8960: issending: 0
Mon Aug 09 13:45:45 2010 ===> PID 8960 is (most likely) contacting WU server, co
ntent length: 43626944
Mon Aug 09 13:45:45 2010 ===> Helper pid: -1
Mon Aug 09 13:45:45 2010 ===> PID 8960: numcores: 0
Mon Aug 09 13:45:45 2010 ===> now: 1281386745, last helper launched at: 12813867
43
Mon Aug 09 13:45:45 2010 ===> WARNING: can only launch one helper in 2 minutes (
per client)
Mon Aug 09 13:45:45 2010 (0) Local: received 116 bytes, sent 0 bytes
Mon Aug 09 13:45:46 2010 Accepted connection from: 127.0.0.1:2488
Mon Aug 09 13:45:46 2010 PID for socket: 8960
Mon Aug 09 13:45:46 2010 PID 8960: issending: 0
Mon Aug 09 13:45:46 2010 ===> PID 8960 is contacting main assignment server
Mon Aug 09 13:45:46 2010 (0) resolving 'assign.stanford.edu:8080'
Mon Aug 09 13:45:51 2010 (0) Connecting to: 171.67.108.200:8080
Mon Aug 09 13:45:54 2010 (0) Connected.
Mon Aug 09 13:45:55 2010 (0) Local connection closed (bsize: 0).
Mon Aug 09 13:45:55 2010 (0) Local: received 559 bytes, sent 396 bytes
Mon Aug 09 13:45:55 2010 (0) Remote: received 396 bytes, sent 559 bytes
Mon Aug 09 13:45:55 2010 Accepted connection from: 127.0.0.1:2493
Mon Aug 09 13:45:55 2010 PID for socket: 8960
Mon Aug 09 13:45:55 2010 PID 8960: issending: 0
Mon Aug 09 13:45:55 2010 ===> PID 8960 is (most likely) contacting WU server, co
ntent length: 512
Mon Aug 09 13:45:55 2010 (0) resolving '171.64.65.56:8080'
Mon Aug 09 13:45:55 2010 (0) Connecting to: 171.64.65.56:8080
Mon Aug 09 13:45:55 2010 (0) Connected.
Mon Aug 09 13:45:55 2010 (0) Remote connection closed (rbsize: 0).
Mon Aug 09 13:45:55 2010 (0) Local: received 625 bytes, sent 40 bytes
Mon Aug 09 13:45:55 2010 (0) Remote: received 40 bytes, sent 625 bytes
37 File(s) copied
launching asynchronous part
Mon Aug 09 13:46:12 2010 Accepted connection from: 127.0.0.1:2495
Mon Aug 09 13:46:12 2010 PID for socket: 8960
Mon Aug 09 13:46:12 2010 PID 8960: issending: 0
Mon Aug 09 13:46:12 2010 ===> PID 8960 is contacting main assignment server
Mon Aug 09 13:46:12 2010 (0) resolving 'assign.stanford.edu:8080'
Mon Aug 09 13:46:12 2010 (0) Connecting to: 171.67.108.200:8080
Mon Aug 09 13:46:12 2010 (0) Connected.
Mon Aug 09 13:46:13 2010 (0) Local connection closed (bsize: 0).
Mon Aug 09 13:46:13 2010 (0) Local: received 559 bytes, sent 396 bytes
Mon Aug 09 13:46:13 2010 (0) Remote: received 396 bytes, sent 559 bytes
Mon Aug 09 13:46:13 2010 Accepted connection from: 127.0.0.1:2497
Mon Aug 09 13:46:13 2010 PID for socket: 8960
Mon Aug 09 13:46:13 2010 PID 8960: issending: 0
Mon Aug 09 13:46:13 2010 ===> PID 8960 is (most likely) contacting WU server, co
ntent length: 512
Mon Aug 09 13:46:13 2010 (0) resolving '171.64.65.56:8080'
Mon Aug 09 13:46:13 2010 (0) Connecting to: 171.64.65.56:8080
Mon Aug 09 13:46:13 2010 (0) Connected.
Mon Aug 09 13:46:13 2010 (0) Remote connection closed (rbsize: 0).
Mon Aug 09 13:46:13 2010 (0) Local: received 625 bytes, sent 40 bytes
Mon Aug 09 13:46:13 2010 (0) Remote: received 40 bytes, sent 625 bytes
Mon Aug 09 13:46:27 2010 Accepted connection from: 127.0.0.1:2499
Mon Aug 09 13:46:27 2010 PID for socket: 8960
Mon Aug 09 13:46:27 2010 PID 8960: issending: 0
Mon Aug 09 13:46:27 2010 ===> PID 8960 is contacting main assignment server
Mon Aug 09 13:46:27 2010 (0) resolving 'assign.stanford.edu:8080'
Mon Aug 09 13:46:27 2010 (0) Connecting to: 171.67.108.200:8080
Mon Aug 09 13:46:27 2010 (0) Connected.
Mon Aug 09 13:46:28 2010 (0) Local connection closed (bsize: 0).
Mon Aug 09 13:46:28 2010 (0) Local: received 559 bytes, sent 396 bytes
Mon Aug 09 13:46:28 2010 (0) Remote: received 396 bytes, sent 559 bytes
Mon Aug 09 13:46:28 2010 Accepted connection from: 127.0.0.1:2501
Mon Aug 09 13:46:28 2010 PID for socket: 8960
Mon Aug 09 13:46:28 2010 PID 8960: issending: 0
Mon Aug 09 13:46:28 2010 ===> PID 8960 is (most likely) contacting WU server, co
ntent length: 512
Mon Aug 09 13:46:28 2010 (0) resolving '171.64.65.56:8080'
Mon Aug 09 13:46:28 2010 (0) Connecting to: 171.64.65.56:8080
Mon Aug 09 13:46:28 2010 (0) Connected.
Mon Aug 09 13:46:33 2010 (0) Local connection closed (bsize: 0).
Mon Aug 09 13:46:33 2010 (0) Local: received 625 bytes, sent 765032 bytes
Mon Aug 09 13:46:33 2010 (0) Remote: received 765032 bytes, sent 625 bytes
Mon Aug 09 13:46:33 2010 Accepted connection from: 127.0.0.1:2503
Mon Aug 09 13:46:33 2010 PID for socket: 8960
Mon Aug 09 13:46:33 2010 PID 8960: issending: 0
Mon Aug 09 13:46:33 2010 ===> PID 8960 is (most likely) contacting WU server, co
ntent length: 43626944
Mon Aug 09 13:46:33 2010 ===> Helper pid: -1
Mon Aug 09 13:46:33 2010 ===> PID 8960: numcores: 0
Mon Aug 09 13:46:33 2010 ===> now: 1281386793, last helper launched at: 12813867
43
Mon Aug 09 13:46:33 2010 ===> WARNING: can only launch one helper in 2 minutes (
per client)
Mon Aug 09 13:46:33 2010 (0) Local: received 118 bytes, sent 0 bytes
Mon Aug 09 13:46:33 2010 Accepted connection from: 127.0.0.1:2504
Mon Aug 09 13:46:33 2010 PID for socket: 8960
Mon Aug 09 13:46:33 2010 PID 8960: issending: 0
Mon Aug 09 13:46:33 2010 ===> PID 8960 is (most likely) contacting WU server, co
ntent length: 43626944
Mon Aug 09 13:46:33 2010 ===> Helper pid: -1
Mon Aug 09 13:46:33 2010 ===> PID 8960: numcores: 0
Mon Aug 09 13:46:33 2010 ===> now: 1281386793, last helper launched at: 12813867
43
Mon Aug 09 13:46:33 2010 ===> WARNING: can only launch one helper in 2 minutes (
per client)
Mon Aug 09 13:46:33 2010 (0) Local: received 114 bytes, sent 0 bytes
Mon Aug 09 13:46:34 2010 Accepted connection from: 127.0.0.1:2505
Mon Aug 09 13:46:34 2010 PID for socket: 8960
Mon Aug 09 13:46:34 2010 PID 8960: issending: 0
Mon Aug 09 13:46:34 2010 ===> PID 8960 is (most likely) contacting WU server, co
ntent length: 43626944
Mon Aug 09 13:46:34 2010 ===> Helper pid: -1
Mon Aug 09 13:46:34 2010 ===> PID 8960: numcores: 0
Mon Aug 09 13:46:34 2010 ===> now: 1281386794, last helper launched at: 12813867
43
Mon Aug 09 13:46:34 2010 ===> WARNING: can only launch one helper in 2 minutes (
per client)
Mon Aug 09 13:46:34 2010 (0) Local: received 120 bytes, sent 0 bytes
Mon Aug 09 13:46:34 2010 Accepted connection from: 127.0.0.1:2506
Mon Aug 09 13:46:34 2010 PID for socket: 8960
Mon Aug 09 13:46:34 2010 PID 8960: issending: 0
Mon Aug 09 13:46:34 2010 ===> PID 8960 is (most likely) contacting WU server, co
ntent length: 43626944
Mon Aug 09 13:46:34 2010 ===> Helper pid: -1
Mon Aug 09 13:46:34 2010 ===> PID 8960: numcores: 0
Mon Aug 09 13:46:34 2010 ===> now: 1281386794, last helper launched at: 12813867
43
Mon Aug 09 13:46:34 2010 ===> WARNING: can only launch one helper in 2 minutes (
per client)
Mon Aug 09 13:46:34 2010 (0) Local: received 116 bytes, sent 0 bytes
Mon Aug 09 13:47:01 2010 Accepted connection from: 127.0.0.1:2519
Mon Aug 09 13:47:01 2010 PID for socket: 8524
Mon Aug 09 13:47:01 2010 PID 8524: issending: 1
Mon Aug 09 13:47:01 2010 (0) resolving '171.64.65.56:8080'
Mon Aug 09 13:47:01 2010 (0) Connecting to: 171.64.65.56:8080
Mon Aug 09 13:47:01 2010 (0) Connected.
Mon Aug 09 13:47:01 2010 (0) Remote connection closed (rbsize: 0).
Mon Aug 09 13:47:01 2010 (0) Local: received 26722 bytes, sent 40 bytes
Mon Aug 09 13:47:01 2010 (0) Remote: received 40 bytes, sent 10338 bytes
Mon Aug 09 13:47:01 2010 (0) Ratelimit: sent 10338 byte(s) in 0.437 seconds, 236
56 Bps (23.10 kBps)
Mon Aug 09 13:47:01 2010 Accepted connection from: 127.0.0.1:2521
Mon Aug 09 13:47:01 2010 PID for socket: 8524
Mon Aug 09 13:47:01 2010 PID 8524: issending: 1
Mon Aug 09 13:47:01 2010 (0) resolving '171.64.65.56:80'
Mon Aug 09 13:47:01 2010 (0) Connecting to: 171.64.65.56:80
Mon Aug 09 13:47:01 2010 (0) Connected.
Mon Aug 09 13:47:02 2010 (0) Remote connection closed (rbsize: 0).
Mon Aug 09 13:47:02 2010 (0) Local: received 25258 bytes, sent 40 bytes
Mon Aug 09 13:47:02 2010 (0) Remote: received 40 bytes, sent 8874 bytes
Mon Aug 09 13:47:02 2010 (0) Ratelimit: sent 8874 byte(s) in 0.421 seconds, 2107
8 Bps (20.58 kBps)
Mon Aug 09 13:47:02 2010 Accepted connection from: 127.0.0.1:2523
Mon Aug 09 13:47:02 2010 PID for socket: 8524
Mon Aug 09 13:47:02 2010 PID 8524: issending: 1
Mon Aug 09 13:47:02 2010 (0) resolving '171.67.108.25:8080'
Mon Aug 09 13:47:02 2010 (0) Connecting to: 171.67.108.25:8080
Mon Aug 09 13:47:02 2010 (0) Connected.
Mon Aug 09 13:47:52 2010 (0) ERROR: remote recv() failed: Unknown error (10053)
Mon Aug 09 13:47:52 2010 (0) Local: received 2115040 bytes, sent 67 bytes
Mon Aug 09 13:47:52 2010 (0) Remote: received 67 bytes, sent 2106736 bytes
Mon Aug 09 13:47:52 2010 (0) Ratelimit: sent 2106736 byte(s) in 50.000 seconds,
42134 Bps (41.14 kBps)
Mon Aug 09 13:47:52 2010 Accepted connection from: 127.0.0.1:2526
Mon Aug 09 13:47:52 2010 PID for socket: 8524
Mon Aug 09 13:47:52 2010 PID 8524: issending: 1
Mon Aug 09 13:47:52 2010 (0) resolving '171.67.108.25:80'
Mon Aug 09 13:47:52 2010 (0) Connecting to: 171.67.108.25:80
Mon Aug 09 13:47:52 2010 (0) Connected.
Mon Aug 09 13:48:40 2010 (0) ERROR: remote recv() failed: Unknown error (10053)
Mon Aug 09 13:48:40 2010 (0) Local: received 2129768 bytes, sent 97 bytes
Mon Aug 09 13:48:40 2010 (0) Remote: received 97 bytes, sent 2113384 bytes
Mon Aug 09 13:48:40 2010 (0) Ratelimit: sent 2113384 byte(s) in 48.187 seconds,
43857 Bps (42.82 kBps)

and the associated smp log file:

Code: Select all

# Windows SMP Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.29

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\FAH-Win
Executable: fah6
Arguments: -betateam -smp -deino -verbosity 9 

[19:06:36] - Ask before connecting: No
[19:06:36] - Proxy: 127.0.0.1:8880
[19:06:36] - User name: Weedacres (Team 52523)
[19:06:36] - User ID: 4A48E9F80FB2A57C
[19:06:36] - Machine ID: 1
[19:06:36] 
[19:06:37] Loaded queue successfully.
[19:06:37] 
[19:06:37] - Autosending finished units... [August 9 19:06:37 UTC]
[19:06:37] + Processing work unit
[19:06:37] Trying to send all finished work units
[19:06:37] Core required: FahCore_a3.exe
[19:06:37] + No unsent completed units remaining.
[19:06:37] - Autosend completed
[19:06:37] Core found.
[19:06:37] Working on queue slot 04 [August 9 19:06:37 UTC]
[19:06:37] + Working ...
[19:06:37] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 04 -np 2 -checkpoint 15 -verbose -lifeline 8960 -version 629'

[19:06:37] 
[19:06:37] *------------------------------*
[19:06:37] Folding@Home Gromacs SMP Core
[19:06:37] Version 2.22 (Mar 12, 2010)
[19:06:37] 
[19:06:37] Preparing to commence simulation
[19:06:37] - Ensuring status. Please wait.
[19:06:47] - Looking at optimizations...
[19:06:47] - Working with standard loops on this execution.
[19:06:47] - Previous termination of core was improper.
[19:06:47] - Files status OK
[19:06:47] - Expanded 764089 -> 1404481 (decompressed 183.8 percent)
[19:06:47] Called DecompressByteArray: compressed_data_size=764089 data_size=1404481, decompressed_data_size=1404481 diff=0
[19:06:47] - Digital signature verified
[19:06:47] 
[19:06:47] Project: 6701 (Run 1, Clone 24, Gen 25)
[19:06:47] 
[19:06:47] Entering M.D.
[19:06:53] Using Gromacs checkpoints
[19:06:53] Resuming from checkpoint
[19:06:54] Verified work/wudata_04.log
[19:06:54] Verified work/wudata_04.trr
[19:06:55] Verified work/wudata_04.xtc
[19:06:55] Verified work/wudata_04.edr
[19:06:55] Completed 1933940 out of 2000000 steps  (96%)
[19:15:56] Completed 1940000 out of 2000000 steps  (97%)
[19:45:37] Completed 1960000 out of 2000000 steps  (98%)
[20:15:32] Completed 1980000 out of 2000000 steps  (99%)
[20:45:20] Completed 2000000 out of 2000000 steps  (100%)
[20:45:21] DynamicWrapper: Finished Work Unit: sleep=10000
[20:45:31] 
[20:45:31] Finished Work Unit:
[20:45:31] - Reading up to 687408 from "work/wudata_04.trr": Read 687408
[20:45:31] trr file hash check passed.
[20:45:31] - Reading up to 42648000 from "work/wudata_04.xtc": Read 42648000
[20:45:31] xtc file hash check passed.
[20:45:31] edr file hash check passed.
[20:45:31] logfile size: 288688
[20:45:31] Leaving Run
[20:45:33] - Writing 43626432 bytes of core data to disk...
[20:45:34]   ... Done.
[20:45:41] - Shutting down core
[20:45:41] 
[20:45:41] Folding@home Core Shutdown: FINISHED_UNIT
[20:45:43] CoreStatus = 64 (100)
[20:45:43] Unit 4 finished with 64 percent of time to deadline remaining.
[20:45:43] Updated performance fraction: 0.656175
[20:45:43] Sending work to server
[20:45:43] Project: 6701 (Run 1, Clone 24, Gen 25)


[20:45:43] + Attempting to send results [August 9 20:45:43 UTC]
[20:45:43] - Reading file work/wuresults_04.dat from core
[20:45:43]   (Read 43626432 bytes from disk)
[20:45:43] Connecting to http://171.64.65.56:8080/
[20:45:43] - Couldn't send HTTP request to server
[20:45:43] + Could not connect to Work Server (results)
[20:45:43]     (171.64.65.56:8080)
[20:45:43] + Retrying using alternative port
[20:45:43] Connecting to http://171.64.65.56:80/
[20:45:43] - Couldn't send HTTP request to server
[20:45:43] + Could not connect to Work Server (results)
[20:45:43]     (171.64.65.56:80)
[20:45:43] - Error: Could not transmit unit 04 (completed August 9) to work server.
[20:45:43] - 1 failed uploads of this unit.
[20:45:43]   Keeping unit 04 in queue.
[20:45:43] Trying to send all finished work units
[20:45:43] Project: 6701 (Run 1, Clone 24, Gen 25)


[20:45:43] + Attempting to send results [August 9 20:45:43 UTC]
[20:45:43] - Reading file work/wuresults_04.dat from core
[20:45:43]   (Read 43626432 bytes from disk)
[20:45:43] Connecting to http://171.64.65.56:8080/
[20:45:44] - Couldn't send HTTP request to server
[20:45:44] + Could not connect to Work Server (results)
[20:45:44]     (171.64.65.56:8080)
[20:45:44] + Retrying using alternative port
[20:45:44] Connecting to http://171.64.65.56:80/
[20:45:44] - Couldn't send HTTP request to server
[20:45:44] + Could not connect to Work Server (results)
[20:45:44]     (171.64.65.56:80)
[20:45:44] - Error: Could not transmit unit 04 (completed August 9) to work server.
[20:45:44] - 2 failed uploads of this unit.


[20:45:44] + Attempting to send results [August 9 20:45:44 UTC]
[20:45:44] - Reading file work/wuresults_04.dat from core
[20:45:45]   (Read 43626432 bytes from disk)
[20:45:45] Connecting to http://171.67.108.25:8080/
[20:45:45] - Couldn't send HTTP request to server
[20:45:45] + Could not connect to Work Server (results)
[20:45:45]     (171.67.108.25:8080)
[20:45:45] + Retrying using alternative port
[20:45:45] Connecting to http://171.67.108.25:80/
[20:45:46] - Couldn't send HTTP request to server
[20:45:46] + Could not connect to Work Server (results)
[20:45:46]     (171.67.108.25:80)
[20:45:46]   Could not transmit unit 04 to Collection server; keeping in queue.
[20:45:46] + Sent 0 of 1 completed units to the server
[20:45:46] - Preparing to get new work unit...
[20:45:46] Cleaning up work directory
[20:45:46] + Attempting to get work packet
[20:45:46] Passkey found
[20:45:46] - Will indicate memory of 3054 MB
[20:45:46] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 7, Stepping: 6
[20:45:46] - Connecting to assignment server
[20:45:46] Connecting to http://assign.stanford.edu:8080/
[20:45:55] Posted data.
[20:45:55] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[20:45:55] + News From Folding@Home: Welcome to Folding@Home
[20:45:55] Loaded queue successfully.
[20:45:55] Connecting to http://171.64.65.56:8080/
[20:45:55] - Couldn't send HTTP request to server
[20:45:55]   (Got status 503)
[20:45:55] + Could not connect to Work Server
[20:45:55] - Attempt #1  to get work failed, and no other work to do.
Waiting before retry.
[20:46:12] + Attempting to get work packet
[20:46:12] Passkey found
[20:46:12] - Will indicate memory of 3054 MB
[20:46:12] - Connecting to assignment server
[20:46:12] Connecting to http://assign.stanford.edu:8080/
[20:46:13] Posted data.
[20:46:13] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[20:46:13] + News From Folding@Home: Welcome to Folding@Home
[20:46:13] Loaded queue successfully.
[20:46:13] Connecting to http://171.64.65.56:8080/
[20:46:13] - Couldn't send HTTP request to server
[20:46:13]   (Got status 503)
[20:46:13] + Could not connect to Work Server
[20:46:13] - Attempt #2  to get work failed, and no other work to do.
Waiting before retry.
[20:46:27] + Attempting to get work packet
[20:46:27] Passkey found
[20:46:27] - Will indicate memory of 3054 MB
[20:46:27] - Connecting to assignment server
[20:46:27] Connecting to http://assign.stanford.edu:8080/
[20:46:28] Posted data.
[20:46:28] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[20:46:28] + News From Folding@Home: Welcome to Folding@Home
[20:46:28] Loaded queue successfully.
[20:46:28] Connecting to http://171.64.65.56:8080/
[20:46:29] Posted data.
[20:46:29] Initial: 0000; - Receiving payload (expected size: 764380)
[20:46:33] - Downloaded at ~186 kB/s
[20:46:33] - Averaged speed for that direction ~171 kB/s
[20:46:33] + Received work.
[20:46:33] Trying to send all finished work units
[20:46:33] Project: 6701 (Run 1, Clone 24, Gen 25)


[20:46:33] + Attempting to send results [August 9 20:46:33 UTC]
[20:46:33] - Reading file work/wuresults_04.dat from core
[20:46:33]   (Read 43626432 bytes from disk)
[20:46:33] Connecting to http://171.64.65.56:8080/
[20:46:33] - Couldn't send HTTP request to server
[20:46:33] + Could not connect to Work Server (results)
[20:46:33]     (171.64.65.56:8080)
[20:46:33] + Retrying using alternative port
[20:46:33] Connecting to http://171.64.65.56:80/
[20:46:33] - Couldn't send HTTP request to server
[20:46:33] + Could not connect to Work Server (results)
[20:46:33]     (171.64.65.56:80)
[20:46:33] - Error: Could not transmit unit 04 (completed August 9) to work server.
[20:46:33] - 3 failed uploads of this unit.


[20:46:34] + Attempting to send results [August 9 20:46:34 UTC]
[20:46:34] - Reading file work/wuresults_04.dat from core
[20:46:34]   (Read 43626432 bytes from disk)
[20:46:34] Connecting to http://171.67.108.25:8080/
[20:46:34] - Couldn't send HTTP request to server
[20:46:34] + Could not connect to Work Server (results)
[20:46:34]     (171.67.108.25:8080)
[20:46:34] + Retrying using alternative port
[20:46:34] Connecting to http://171.67.108.25:80/
[20:46:35] - Couldn't send HTTP request to server
[20:46:35] + Could not connect to Work Server (results)
[20:46:35]     (171.67.108.25:80)
[20:46:35]   Could not transmit unit 04 to Collection server; keeping in queue.
[20:46:35] + Sent 0 of 1 completed units to the server
[20:46:35] + Closed connections
[20:46:35] 
[20:46:35] + Processing work unit
[20:46:35] Core required: FahCore_a3.exe
[20:46:35] Core found.
[20:46:35] Working on queue slot 05 [August 9 20:46:35 UTC]
[20:46:35] + Working ...
[20:46:35] - Calling '.\FahCore_a3.exe -dir work/ -nice 19 -suffix 05 -np 2 -checkpoint 15 -verbose -lifeline 8960 -version 629'

[20:46:35] 
[20:46:35] *------------------------------*
[20:46:35] Folding@Home Gromacs SMP Core
[20:46:35] Version 2.22 (Mar 12, 2010)
[20:46:35] 
[20:46:35] Preparing to commence simulation
[20:46:35] - Looking at optimizations...
[20:46:35] - Created dyn
[20:46:35] - Files status OK
[20:46:36] - Expanded 763868 -> 1404481 (decompressed 183.8 percent)
[20:46:36] Called DecompressByteArray: compressed_data_size=763868 data_size=1404481, decompressed_data_size=1404481 diff=0
[20:46:36] - Digital signature verified
[20:46:36] 
[20:46:36] Project: 6702 (Run 8, Clone 20, Gen 24)
[20:46:36] 
[20:46:36] Assembly optimizations on if available.
[20:46:36] Entering M.D.
[20:46:42] Completed 0 out of 2000000 steps  (0%)
I've tried this on 2 different systems with the same result on smp clients. I've also tried it with 2 gpu2 clients and it works fine.
I haven't done anything other than set the client for proxy at 127.0.0.1 on port 8880, that is to say, I haven't done any changes to the instructions in the announcement.

Any ideas on what I'm doing wrong?

Thanks

Re: Langouste -- WU upload/download de-coupler (+upload capping)

Posted: Mon Aug 09, 2010 9:22 pm
by tear
Hey Weedacres,

Thanks for the report, appreciate it :-)

Code: Select all

Mon Aug 09 13:48:40 2010 (0) ERROR: remote recv() failed: Unknown error (10053)
Those errors are somewhat disturbing but before we venture there let me tell you this --

Upload failures are _expected_ in the "main" client. Langouste's principle of operation is
simulating failures to the "main" client and creating its copy in temp directory (in your case it's
C:\DOCUME~1\Thinkpad\LOCALS~1\Temp\langouste-Thinkpad\) to return the results.

That said, can you please take a peek there and check for presence of langouste logs?


Thanks,
tear

Re: Langouste -- WU upload/download de-coupler (+upload capping)

Posted: Mon Aug 09, 2010 10:43 pm
by weedacres
Thanks for the quick response!

Here's the log file.

Code: Select all

Mon 08/09/2010 
01:45 PM
Checking WU 00
Checking WU 01
Checking WU 02
Checking WU 03
Checking WU 04
Launching: C:\FAH-Win\fah6.exe -local -verbosity 9 -send 04

Note: Please read the license agreement (fah6.exe -license). Further 
use of this software requires that you have read and accepted this agreement.

Using local directory for work files
2 cores detected
If you see this twice, MPI is working
If you see this twice, MPI is working


--- Opening Log file [August 9 20:47:00 UTC] 


# Windows SMP Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.29

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\DOCUME~1\Thinkpad\LOCALS~1\Temp\langouste-Thinkpad\29135\clientdir
Executable: C:\FAH-Win\fah6.exe
Arguments: -local -send 04 -betateam -smp -deino -verbosity 9 

[20:47:00] - Ask before connecting: No
[20:47:00] - Proxy: 127.0.0.1:8880
[20:47:00] - User name: Weedacres (Team 52523)
[20:47:00] - User ID: 4A48E9F80FB2A57C
[20:47:00] - Machine ID: 1
[20:47:00] 
[20:47:01] Loaded queue successfully.
[20:47:01] Attempting to return result(s) to server...
[20:47:01] Project: 6701 (Run 1, Clone 24, Gen 25)


[20:47:01] + Attempting to send results [August 9 20:47:01 UTC]
[20:47:01] - Reading file work/wuresults_04.dat from core
[20:47:01]   (Read 43626432 bytes from disk)
[20:47:01] Connecting to http://171.64.65.56:8080/
[20:47:01] - Couldn't send HTTP request to server
[20:47:01] + Could not connect to Work Server (results)
[20:47:01]     (171.64.65.56:8080)
[20:47:01] + Retrying using alternative port
[20:47:01] Connecting to http://171.64.65.56:80/
[20:47:02] - Couldn't send HTTP request to server
[20:47:02] + Could not connect to Work Server (results)
[20:47:02]     (171.64.65.56:80)
[20:47:02] - Error: Could not transmit unit 04 (completed August 9) to work server.
[20:47:02] - 3 failed uploads of this unit.


[20:47:02] + Attempting to send results [August 9 20:47:02 UTC]
[20:47:02] - Reading file work/wuresults_04.dat from core
[20:47:02]   (Read 43626432 bytes from disk)
[20:47:02] Connecting to http://171.67.108.25:8080/
[20:47:52] - Couldn't send HTTP request to server
[20:47:52] + Could not connect to Work Server (results)
[20:47:52]     (171.67.108.25:8080)
[20:47:52] + Retrying using alternative port
[20:47:52] Connecting to http://171.67.108.25:80/
[20:48:40] Posted data.
[20:48:40] Initial: 0000; + Could not connect to Work Server (results)
[20:48:40]     (171.67.108.25:80)
[20:48:40]   Could not transmit unit 04 to Collection server; keeping in queue.
[20:48:40] - Failed to send unit 04 to server
[20:48:40] ***** Got a SIGTERM signal (2)
[20:48:40] Killing all core threads
[20:48:40] Killing 2 cores
[20:48:40] Killing core 0
[20:48:40] Killing core 1

Folding@Home Client Shutdown.
Checking if already returned...

---------- FAHLOG.TXT
Unit 04 NOT sent!
Checking WU 05
Checking WU 06
Checking WU 07
Checking WU 08
Checking WU 09