wu 14911 exceeding low utilization.
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 78
- Joined: Sun Apr 26, 2020 1:29 pm
wu 14911 exceeding low utilization.
Anyone else seeing issues with 14911?
I have a 2060s sitting at est 150932 PPD & dropping.
GPU is at 60 watt, 63% util.
Other 2060s is at 2.3M PPD on wu 17319.
This is a new setup, but it looked good... until this wu.
Not seeing any log errors.
In the viewer, it's a membrane structure, but it has chunks flying in & out, not normal.
I have a 2060s sitting at est 150932 PPD & dropping.
GPU is at 60 watt, 63% util.
Other 2060s is at 2.3M PPD on wu 17319.
This is a new setup, but it looked good... until this wu.
Not seeing any log errors.
In the viewer, it's a membrane structure, but it has chunks flying in & out, not normal.
-
- Posts: 78
- Joined: Sun Apr 26, 2020 1:29 pm
Re: wu 14911 exceeding low utilization.
I finished the other two slots, then deleted all the GPU, returning that wu.
Replaced the 2nd 2060s with a 2060KO.
Box is folding at 5.9M PPD.
Guess I'm superstitious, I can't put those 2060s together... shuffled cards.
Just hit my all-time high, 17.2M PPD.
17M is top100 territory, I'm liking that.
Replaced the 2nd 2060s with a 2060KO.
Box is folding at 5.9M PPD.
Guess I'm superstitious, I can't put those 2060s together... shuffled cards.
Just hit my all-time high, 17.2M PPD.
17M is top100 territory, I'm liking that.
-
- Posts: 520
- Joined: Fri Apr 03, 2020 2:22 pm
- Hardware configuration: ASRock X370M PRO4
Ryzen 2400G APU
16 GB DDR4-3200
MSI GTX 1660 Super Gaming X
Re: wu 14911 exceeding low utilization.
Strange that it was running that low. It's not a really small atom count WU, so hard to say what would cause it.
I've only run one instance of that WU, and on a little 2400G APU it returned over half what the 2060 return. Strange.
I've only run one instance of that WU, and on a little 2400G APU it returned over half what the 2060 return. Strange.
Fold them if you get them!
-
- Site Moderator
- Posts: 6986
- Joined: Wed Dec 23, 2009 9:33 am
- Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB
Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400 - Location: Land Of The Long White Cloud
- Contact:
Re: wu 14911 exceeding low utilization.
IF you had the viewer open while it was folding, it would cause a slow-down in the GPU folding since resources are being taken away from the GPU to render the simulation. However, if you could provide the PRCG, we can ask the researcher for additional details to see what's happening.
BTW, just for general information, when you deleted that GPU Slot, the WU isn't "returned" but rather it is dumped which means that the Server will have to wait for it to time-out before being reassigned which would slow down science. I personally won't dump any WUs even if they are "slow" since it allows me to report it to the researcher to discover any issues with the simulation and either stop the trajectory or fix it up
BTW, just for general information, when you deleted that GPU Slot, the WU isn't "returned" but rather it is dumped which means that the Server will have to wait for it to time-out before being reassigned which would slow down science. I personally won't dump any WUs even if they are "slow" since it allows me to report it to the researcher to discover any issues with the simulation and either stop the trajectory or fix it up
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
-
- Posts: 78
- Joined: Sun Apr 26, 2020 1:29 pm
Re: wu 14911 exceeding low utilization.
Considering the MANY WU that I've lost due to power sags & outages, it would be a good & useful client 'feature' to track & report lost WU, whatever the reason. HFM, tracks them. 20 yrs on, the client should track & report events that commonly could interfere with the completion of a WU.PantherX wrote:BTW, just for general information, when you deleted that GPU Slot, the WU isn't "returned" but rather it is dumped which means that the Server will have to wait for it to time-out before being reassigned which would slow down science. I personally won't dump any WUs even if they are "slow" since it allows me to report it to the researcher to discover any issues with the simulation and either stop the trajectory or fix it up
-
- Site Moderator
- Posts: 6986
- Joined: Wed Dec 23, 2009 9:33 am
- Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB
Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400 - Location: Land Of The Long White Cloud
- Contact:
Re: wu 14911 exceeding low utilization.
Yep, the primary developer is aware of this but not sure if/when it gets implemented.
When it comes to WU losses due to power outages, I do believe that FahCore should be able to handle them gracefully as long as there's a single valid checkpoint. Thus, it would prevent WU loss and you wouldn't have "wasted" your system resources due to an unexpected event that you can't control*
*One could use a fancy UPS but that's additional cost and resources that might not be considered "home" environment.
When it comes to WU losses due to power outages, I do believe that FahCore should be able to handle them gracefully as long as there's a single valid checkpoint. Thus, it would prevent WU loss and you wouldn't have "wasted" your system resources due to an unexpected event that you can't control*
*One could use a fancy UPS but that's additional cost and resources that might not be considered "home" environment.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Re: wu 14911 exceeding low utilization.
The FAHClient does have a feature that attempts to report lost WUs but there are quite a number of different reasons a WU can be lost. Each of those reasons would probably need a different detection method to be coded. That makes the solution to the problem very complex.cine.chris wrote:Considering the MANY WU that I've lost due to power sags & outages, it would be a good & useful client 'feature' to track & report lost WU, whatever the reason. HFM, tracks them. 20 yrs on, the client should track & report events that commonly could interfere with the completion of a WU.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 78
- Joined: Sun Apr 26, 2020 1:29 pm
Re: wu 14911 exceeding low utilization.
The procedure described above, appeared to have sent it. It's very quick. Should the situation occur again, I'll check logs.
The lost power situation, I'll check the logs. There. all I get to see is freshly loaded WU. The windows clients seem to do better than the Linux, but that's an impression. When it occurs again, I'll take some notes. They might be on different circuits too.
The lost power situation, I'll check the logs. There. all I get to see is freshly loaded WU. The windows clients seem to do better than the Linux, but that's an impression. When it occurs again, I'll take some notes. They might be on different circuits too.
-
- Posts: 78
- Joined: Sun Apr 26, 2020 1:29 pm
Re: wu 14911 exceeding low utilization.
A returned WU:
===============
15:23:04:FS01:Shutting core down
15:23:04:WARNING:WU00:Slot ID 0 no longer exists and there are no other matching slots, dumping
15:23:04:WU00:Sending unit results: id:00 state:SEND error:DUMPED project:13444 run:1247 clone:54 gen:1 core:0x22 unit:0x000000360000000100003484000004df
15:23:04:WU01:FS01:0xa8:WARNING:Console control signal 1 on PID 7020
15:23:04:WU01:FS01:0xa8:Exiting, please wait. . .
15:23:04:WU00:Connecting to 18.188.125.154:8080
15:23:04:WU00:Server responded WORK_ACK (400)
15:23:04:WU00:Cleaning up
========================
I'm consolidating platforms & a GTX1050 started & was a planned upgrade target in my monitor system, it would hold-up ~11M PPD in other gear moving, all set to finish.
As shuffling gear could lose work & WUs, I always finish slots that could be affected. This transition was across three systems.
An 8Hr WU was a problem. Halted, as described above, it was returned & ACK rc'vd.
And yes, as I recalled, it showed a SEND status in the GUI.
No WU were harmed in this transition.
cine.chris
===============
15:23:04:FS01:Shutting core down
15:23:04:WARNING:WU00:Slot ID 0 no longer exists and there are no other matching slots, dumping
15:23:04:WU00:Sending unit results: id:00 state:SEND error:DUMPED project:13444 run:1247 clone:54 gen:1 core:0x22 unit:0x000000360000000100003484000004df
15:23:04:WU01:FS01:0xa8:WARNING:Console control signal 1 on PID 7020
15:23:04:WU01:FS01:0xa8:Exiting, please wait. . .
15:23:04:WU00:Connecting to 18.188.125.154:8080
15:23:04:WU00:Server responded WORK_ACK (400)
15:23:04:WU00:Cleaning up
========================
I'm consolidating platforms & a GTX1050 started & was a planned upgrade target in my monitor system, it would hold-up ~11M PPD in other gear moving, all set to finish.
As shuffling gear could lose work & WUs, I always finish slots that could be affected. This transition was across three systems.
An 8Hr WU was a problem. Halted, as described above, it was returned & ACK rc'vd.
And yes, as I recalled, it showed a SEND status in the GUI.
No WU were harmed in this transition.
cine.chris
Last edited by cine.chris on Wed Jan 27, 2021 4:21 pm, edited 1 time in total.
Re: wu 14911 exceeding low utilization.
I don't understand.cine.chris wrote:Halted, as described above, it was returned & ACK rc'vd.
And yes, as I recalled, it showed a SEND status in the GUI.
No WU were harmed in this transition.
If the ACK was received, the server should have a record of the completed WU and FAHClient should have done the cleanup processing on the WU, in which case there would be no WU to try to move to a different slot.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
-
- Posts: 78
- Joined: Sun Apr 26, 2020 1:29 pm
Re: wu 14911 exceeding low utilization.
Sorry if I'm not stating the situation clearly.bruce wrote:I don't understand.cine.chris wrote:Halted, as described above, it was returned & ACK rc'vd.
And yes, as I recalled, it showed a SEND status in the GUI.
No WU were harmed in this transition.
If the ACK was received, the server should have a record of the completed WU and FAHClient should have done the cleanup processing on the WU, in which case there would be no WU to try to move to a different slot.
The WU wasn't completed. But, even interrupted, the WU was returned to the work server.
That was the question brought up earlier in the post... is it returned?
Pausing and removing the Slot with the GTX1050 with the unfinished WU, did return the WU, with a SEND status & an ACK in log.
Re: wu 14911 exceeding low utilization.
yes, it was Returned. When sent before complete, such as when it fails to see the slot, the WUs are Dumped.
https://apps.foldingathome.org/wu#proje ... e=54&gen=1
https://apps.foldingathome.org/wu#proje ... e=54&gen=1