I think I have a bad one, WU 18244 Core 0x26
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 163
- Joined: Tue Dec 04, 2007 5:56 am
- Hardware configuration: Gigabyte Z790 UD AC , i7-14700K (x2) Win11
ASUS Z97_K , i5-4460 (x2) Win10 & UBUNTU 24.04
GPU's ASUS RTX1660 x2, RTX3050 x2
EVGA1060 - Location: California Wine country
I think I have a bad one, WU 18244 Core 0x26
this has failed 3 times in the last 2 days for other folders. It started out at over 1 hour per frame.
that has gradually come down . now run time per frame is 13 m :28 sec
this is on a GTX 1660 Super, Same card in my win11 box is putting out over 1 Million PPD
Might be worth keeping an eye on
that has gradually come down . now run time per frame is 13 m :28 sec
this is on a GTX 1660 Super, Same card in my win11 box is putting out over 1 Million PPD
Might be worth keeping an eye on
-
- Site Admin
- Posts: 8118
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4 - Location: W. MA
Re: I think I have a bad one, WU 18244 Core 0x26
What are the Run, Clone, and Gen numbers for the WU? No way for us to tell and keep an eye on it without those.
-
- Posts: 163
- Joined: Tue Dec 04, 2007 5:56 am
- Hardware configuration: Gigabyte Z790 UD AC , i7-14700K (x2) Win11
ASUS Z97_K , i5-4460 (x2) Win10 & UBUNTU 24.04
GPU's ASUS RTX1660 x2, RTX3050 x2
EVGA1060 - Location: California Wine country
Re: I think I have a bad one, WU 18244 Core 0x26
RCG 441,0,59
runtimes are now 5m:55sec
I've never had a wu that frame times went from over an hour to 5:55. they are always the same down to the second ??
runtimes are now 5m:55sec
I've never had a wu that frame times went from over an hour to 5:55. they are always the same down to the second ??
Re: I think I have a bad one, WU 18244 Core 0x26
Was it the frame estimate that went from over an hour to 5:55 or did you time the actual duration of the frames? Because the initial frame estimate is sometimes very inaccurate even if it is often correct down to the second. What matters is how long each frame is actually taking.
All the failures from the other three people for P18244 R441 C0 G59 are instant failures which indicates a core compatibility problem, and those are not unheard of for core26.
Can you post the log for that core so we can see how long each frame takes? Because I suspect that it's just an ETA estimation issue along with the WU getting unlucky and being sent to three people in a row with incompatible libraries for core26, which makes it seem like it's bad but could just be a false positive.
All the failures from the other three people for P18244 R441 C0 G59 are instant failures which indicates a core compatibility problem, and those are not unheard of for core26.
Can you post the log for that core so we can see how long each frame takes? Because I suspect that it's just an ETA estimation issue along with the WU getting unlucky and being sent to three people in a row with incompatible libraries for core26, which makes it seem like it's bad but could just be a false positive.
-
- Posts: 163
- Joined: Tue Dec 04, 2007 5:56 am
- Hardware configuration: Gigabyte Z790 UD AC , i7-14700K (x2) Win11
ASUS Z97_K , i5-4460 (x2) Win10 & UBUNTU 24.04
GPU's ASUS RTX1660 x2, RTX3050 x2
EVGA1060 - Location: California Wine country
Re: I think I have a bad one, WU 18244 Core 0x26
The wonky numbers are from the " View Work Unit Details" when you click on the little i in the
client window. the actual log shows a stable 4:06. It's the view details page that's messed up.
Sorry for the confusion, I should have dug a little deeper before pushing the panic button
client window. the actual log shows a stable 4:06. It's the view details page that's messed up.
Sorry for the confusion, I should have dug a little deeper before pushing the panic button
Re: I think I have a bad one, WU 18244 Core 0x26
A lot of people get confused by the way the client interface works. Some people have even dumped perfectly good work units because it seemed to be glitchy when it suffered a known bug that caused PPD to inflate to impossible numbers. The interface, and the estimation algorithm, definitely needs improvement.
-
- Posts: 163
- Joined: Tue Dec 04, 2007 5:56 am
- Hardware configuration: Gigabyte Z790 UD AC , i7-14700K (x2) Win11
ASUS Z97_K , i5-4460 (x2) Win10 & UBUNTU 24.04
GPU's ASUS RTX1660 x2, RTX3050 x2
EVGA1060 - Location: California Wine country
Re: I think I have a bad one, WU 18244 Core 0x26
Follow-up, dug back through the log, the problem child { W1 } finished up over night.
06:13:45:11 W1 finished 100%
06:13:49:11 W1 Credited. It must have been big, it seemed longer than avg. to upload
client seems to be back to normal what ever that is. frame times are right at 3 min
Another crisis survived, case closed
06:13:45:11 W1 finished 100%
06:13:49:11 W1 Credited. It must have been big, it seemed longer than avg. to upload
client seems to be back to normal what ever that is. frame times are right at 3 min
Another crisis survived, case closed
-
- Site Admin
- Posts: 8118
- Joined: Tue Apr 21, 2009 4:41 pm
- Hardware configuration: Mac Studio M1 Max 32 GB smp6
Mac Hack i7-7700K 48 GB smp4 - Location: W. MA
Re: I think I have a bad one, WU 18244 Core 0x26
Project 18244 is a medium sized project in terms of atom count, but the researcher may be loading up additional information on the return. More frequent checkpoints and sanity checks would result in a larger upload.
Re: I think I have a bad one, WU 18244 Core 0x26
More frequent checkpoints don't increase the total size (only the last one is sent), but more frequent XTC and especially TRR trajectory snapshots would. Those are files that encode the types, positions, and velocities of each particle. XTC is like TRR but contains reduced-resolution trajectories to reduce file size. 18244 doesn't use TRR trajectories and that's what would account for the biggest size increase.
It does use more frequent XTC trajectory snapshots than most projects (every 0.4% so 250 total) but that wouldn't make it extremely large by itself, and it's not sending any extra files (the only big files are the serialized final checkpoint and the XTC file).
Usually only non-solvent atoms are written to the XTC file but because they're testing a new type of water (OCL3-pol), it might be that the XTC file contains the solvent as well, which would inflate size significantly especially because they're sending 250 XTC trajectories (usually it's more like 20 to 50 that get sent). The core.xml file would answer why it's bigger.
-
- Posts: 84
- Joined: Wed Mar 18, 2020 2:55 pm
- Hardware configuration: HP Z600 (5) HP Z800 (3) HP Z440 (3)
ASUS Turbo GTX 1060, 1070, 1080, RTX 2060 (3)
Dell GTX 1080 - Location: Sydney Australia
Re: I think I have a bad one, WU 18244 Core 0x26
There are other projects from this family, eg 18251 that (for unknown reasons) run very slowly (~1/3 usual PPD) on TU106 version RTX 2060, also RTX 1660 and 1660 Super. The only solution seems to be to avoid them. See viewtopic.php?t=42221 for more