Default using all CPUs seems flawed
Moderators: Site Moderators, FAHC Science Team
-
- Posts: 41
- Joined: Mon Oct 24, 2022 4:32 am
Default using all CPUs seems flawed
How v8 handles resource allocation seems rather flawed. Leaving it at default I got a 3 CPU core job reach 66%, then it got a GPU job (probably Alzeimers) which took over all resources, leading to the CPU only job potentially never finishing.
I realise I can reduce the CPU to 1 and add a CPU only virtual peer, but then I lose the ability to use all CPU cores for a GPU job that CAN make use of them.
I'm not really sure what the solution to this could be though. Presumably the long-term goal is to have all GPU jobs utilise all CPU cores rather than differentiating between the two. But can this even be done effectively given a GPUs performance is so many times greater?
Is there an explanation somewhere for why this is better than the old system?
I realise I can reduce the CPU to 1 and add a CPU only virtual peer, but then I lose the ability to use all CPU cores for a GPU job that CAN make use of them.
I'm not really sure what the solution to this could be though. Presumably the long-term goal is to have all GPU jobs utilise all CPU cores rather than differentiating between the two. But can this even be done effectively given a GPUs performance is so many times greater?
Is there an explanation somewhere for why this is better than the old system?
-
- Site Moderator
- Posts: 1117
- Joined: Sat Dec 08, 2007 1:33 am
- Location: San Francisco, CA
- Contact:
Re: Default using all CPUs seems flawed
I believe there are currently no hybrid CPU + GPU cores.
The resource system was designed to use them in the future.
Yes, if you create separate resource groups for each GPU, you will have to reduce the available CPUs in the default group.
It may be best to just have the default group with all resources.
If you wish to change allocation among groups, it can be helpful to set everything to finishing and reconfigure after groups are paused.
To only do do GPU folding, you might be able to set CPUs to 1 in each GPU resource group.
Set CPUs zero in default group.
I have not tried this because I'm just running macOS right now.
The resource system was designed to use them in the future.
Yes, if you create separate resource groups for each GPU, you will have to reduce the available CPUs in the default group.
It may be best to just have the default group with all resources.
If you wish to change allocation among groups, it can be helpful to set everything to finishing and reconfigure after groups are paused.
To only do do GPU folding, you might be able to set CPUs to 1 in each GPU resource group.
Set CPUs zero in default group.
I have not tried this because I'm just running macOS right now.
-
- Posts: 41
- Joined: Mon Oct 24, 2022 4:32 am
Re: Default using all CPUs seems flawed
I understand that, what seemed to go wrong is that for some reason it picked a CPU job, then some time later a GPU one and it paused the CPU job indefinitely claiming there were insufficient resources.
Thinking back, maybe this was because like past versions it disables the GPU by default and when I enabled it the GPU job took priority leaving the CPU job no way to finish? Perhaps there needs to be some failsafe for this first configuration scenario so the CPU job finishes before it pulls down a GPU job? Or ask if you want to enable the GPU before even looking for jobs at all?
Thinking back, maybe this was because like past versions it disables the GPU by default and when I enabled it the GPU job took priority leaving the CPU job no way to finish? Perhaps there needs to be some failsafe for this first configuration scenario so the CPU job finishes before it pulls down a GPU job? Or ask if you want to enable the GPU before even looking for jobs at all?
-
- Site Moderator
- Posts: 1117
- Joined: Sat Dec 08, 2007 1:33 am
- Location: San Francisco, CA
- Contact:
Re: Default using all CPUs seems flawed
The 8.1.11 client is supposed to allow the cpus for a WU to be changed without claiming insufficient resources.
It's possible the assignment had a minimum cpus that was no longer met.
This sounds like a bug you might want to report.
https://github.com/FoldingAtHome/fah-cl ... tet/issues
It's possible the assignment had a minimum cpus that was no longer met.
This sounds like a bug you might want to report.
https://github.com/FoldingAtHome/fah-cl ... tet/issues
-
- Site Moderator
- Posts: 1117
- Joined: Sat Dec 08, 2007 1:33 am
- Location: San Francisco, CA
- Contact:
Re: Default using all CPUs seems flawed
Meanwhile, you might need to set everything to finishing for the situation to clear up. If so, that would be a separate bug.
It would be good to know if another GPU WU is started without running the stalled CPU WU.
It would be good to know if another GPU WU is started without running the stalled CPU WU.
-
- Posts: 41
- Joined: Mon Oct 24, 2022 4:32 am
Re: Default using all CPUs seems flawed
Being effectively in the same slot I couldn't figure out how to pause the GPU unit to let it finish so I ended up clearing it, never occurred to me Finishing might do it. Worth noting for the future as I hate dropping WUs.
-
- Posts: 41
- Joined: Mon Oct 24, 2022 4:32 am
-
- Site Moderator
- Posts: 1117
- Joined: Sat Dec 08, 2007 1:33 am
- Location: San Francisco, CA
- Contact:
Re: Default using all CPUs seems flawed
I think a GPU WU assignment taking more than 1 cpu is a bug. It could be an assignment server bug or misconfiguration.
Please report it as a client issue.
Please report it as a client issue.
-
- Posts: 41
- Joined: Mon Oct 24, 2022 4:32 am
Re: Default using all CPUs seems flawed
Until it gets fixed, you could setup Resource Groups to have separate 'Slot'-like behavior for your CPU and GPU, see GitHub Issue #52 for more info about enabling it.
-
- Posts: 41
- Joined: Mon Oct 24, 2022 4:32 am
Re: Default using all CPUs seems flawed
I tried that and yes it fixes that problem, but then the / peers are not reported in the initial json from the websocket connection which breaks my ability to monitor my folding boxes as I have no idea how to request the data for that virtual peer.
Obviously long-term I need to learn how to properly use the websockets properly, but of course there will no documentation until after the beta. Right now with a dirty hack (constantly closing/opening the websocket) I was able to include v8 alongside my v7 monitoring. Without that, I can't see if something is wrong.
Obviously long-term I need to learn how to properly use the websockets properly, but of course there will no documentation until after the beta. Right now with a dirty hack (constantly closing/opening the websocket) I was able to include v8 alongside my v7 monitoring. Without that, I can't see if something is wrong.
-
- Site Moderator
- Posts: 1117
- Joined: Sat Dec 08, 2007 1:33 am
- Location: San Francisco, CA
- Contact:
Re: Default using all CPUs seems flawed
You access the resource group peer with a separate websocket. Append the group name to ws: url. Example
ws://127.0.0.1:7396/api/websocket/myrg
ws://127.0.0.1:7396/api/websocket/myrg
-
- Posts: 41
- Joined: Mon Oct 24, 2022 4:32 am
Re: Default using all CPUs seems flawed
Thanks, it seems the problem will be fixed in the next release so I will probably just leave it alone as it seems somewhat random and as long as the CPU WU doesn't reach the timeout, its just a slight delay (relative the CPU WU timeouts) getting the WU finished.