Half done tasks destroyed due to card not being there temporarily

It seems that a lot of GPU problems revolve around specific versions of drivers. Though AMD has their own support structure, you can often learn from information reported by others who fold.

Moderators: Site Moderators, FAHC Science Team

Post Reply
Lamberto Vitali
Posts: 80
Joined: Fri Feb 25, 2022 12:21 am

Half done tasks destroyed due to card not being there temporarily

Post by Lamberto Vitali »

I'm sure a lot of us have all sorts of hardware, and like I just did, on testing a new card, remove the other cards from that machine first. On booting, FAH destroys all the work in progress, not being patient enough to see if the card comes back on the next boot.
aetch
Posts: 436
Joined: Thu Jun 25, 2020 3:04 pm
Location: Between chair and keyboard

Re: Half done tasks destroyed due to card not being there temporarily

Post by aetch »

It sounds like the client worked as designed.

You removed the cards, which removes the GPU slots.
In turn the client dumps the work units because it no longer has a suitable slot to process the work units on.
This is done within seconds of the client starting.
Folding Rigs - None (25-Jun-2022)

ImageImage
Lamberto Vitali
Posts: 80
Joined: Fri Feb 25, 2022 12:21 am

Re: Half done tasks destroyed due to card not being there temporarily

Post by Lamberto Vitali »

Then that is a very poor design, as the cards were back in the next boot. Why would they assume someone is going to permanently have no GPUs?
aetch
Posts: 436
Joined: Thu Jun 25, 2020 3:04 pm
Location: Between chair and keyboard

Re: Half done tasks destroyed due to card not being there temporarily

Post by aetch »

One or more of several things may have happened including:-
*). you may have copied to the config to another computer.
*). the GPU(s) may have failed and will never be available again.
*). you have decided that "no GPU" is your config going forward.
*). you may have decided to change your GPU(s) to different makes and/or species which are not suitable for the work units you were assigned.

It's wrong for FAH to assume you're going to return to a previous configuration. It can only work with what it has, at that moment.

Personally, for a period from aug/sep 2020 to about apr/may 2021 I ran 2 machines. One was dedicated to GPU folding, the other was dedicated to CPU folding. The machine I dedicated to CPU folding had no GPU of any description, not even integrated. I could not plug in a monitor of any description, I could only access it through remote desktop.

If you want to experiment with different GPUs I would recommend the following:-
add "pause-on-start true" to your config, this will ensure the client does not start any slots until you manually start them
finish all current work units, this will ensure there are no work units pending on the system while you experiment
you may also want to remove your username and passkey to ensure any work units dumped during testing are not added to your personal tally
Folding Rigs - None (25-Jun-2022)

ImageImage
Lamberto Vitali
Posts: 80
Joined: Fri Feb 25, 2022 12:21 am

Re: Half done tasks destroyed due to card not being there temporarily

Post by Lamberto Vitali »

Species :-)
It would make more sense to stick with the status quo until at least another couple of reboots or a day has passed. It's most likely the GPU will return. Were those tasks even sent back so someone else can do them? Or just deleted so the scientists have to wait for them to timeout?

Boinc just says "missing GPU" against the task and lets the user do what is best.

As for your setups, your CPU-only machine never had a GPU for folding. It would be unlikely you'd take all the GPUs out of your first machine and put them in the other, even if you did so, you could manually cancel those slots.

Wow, I'm not going to go to all that trouble and wait half a day for things to complete.
aetch
Posts: 436
Joined: Thu Jun 25, 2020 3:04 pm
Location: Between chair and keyboard

Re: Half done tasks destroyed due to card not being there temporarily

Post by aetch »

Lamberto Vitali wrote: Fri May 20, 2022 9:17 am As for your setups, your CPU-only machine never had a GPU for folding. It would be unlikely you'd take all the GPUs out of your first machine and put them in the other, even if you did so, you could manually cancel those slots.
I did and I looked at doing it again over winter 2021.
There's a thread on the topic if you want to read a little more.
viewtopic.php?t=37165
Folding Rigs - None (25-Jun-2022)

ImageImage
Lamberto Vitali
Posts: 80
Joined: Fri Feb 25, 2022 12:21 am

Re: Half done tasks destroyed due to card not being there temporarily

Post by Lamberto Vitali »

But you'd delete the relevant slots yourself. A computer making assumptions about your intentions in insanity. You might have done so, realised the other wouldn't take them all, then put one back in the first machine.
aetch
Posts: 436
Joined: Thu Jun 25, 2020 3:04 pm
Location: Between chair and keyboard

Re: Half done tasks destroyed due to card not being there temporarily

Post by aetch »

Many of the decisions about how the FAHClient works were made long before I started contributing.
I'm merely using the client to the best of my understanding. Some of that understanding was gained through doing things, some stupid and other calculated e.g. swapping out GPUs, changing the CPU slot size, splitting the CPU into two slots, different cooling solutions, monitoring tools, etc. One of the things I did learn was that if you have multiple GPUs in a system and remove one then the client will try to queue its work unit to one of the remaining GPUs.

When I decided to run one of my machines as CPU only I did indeed deliberately delete the GPU slot, I was making a long term decision. I knew the second machine would be just fine with two GPUs, it had previously ran with two GPUs and was capable of taking more. My biggest concern was actually whether the CPU only system would boot at all, considering it had no video output.
Folding Rigs - None (25-Jun-2022)

ImageImage
Lamberto Vitali
Posts: 80
Joined: Fri Feb 25, 2022 12:21 am

Re: Half done tasks destroyed due to card not being there temporarily

Post by Lamberto Vitali »

aetch wrote: Fri May 20, 2022 11:13 amMany of the decisions about how the FAHClient works were made long before I started contributing.
I'm merely using the client to the best of my understanding. Some of that understanding was gained through doing things, some stupid and other calculated e.g. swapping out GPUs, changing the CPU slot size, splitting the CPU into two slots, different cooling solutions, monitoring tools, etc. One of the things I did learn was that if you have multiple GPUs in a system and remove one then the client will try to queue its work unit to one of the remaining GPUs.
I've given up trying to be nice to the client. If it does something daft, I correct it. If work is lost, that's their fault. All I can do is make sure every chip is flat out.
aetch wrote: Fri May 20, 2022 11:13 amMy biggest concern was actually whether the CPU only system would boot at all, considering it had no video output.
Do machines mind that? I've never had a problem doing so. Except when I need to access the BIOS!

ARGH! My 4 GPU machine keeps locking up. You can get to hate GPUs.
Post Reply