Why deleting a work unit is bad

Moderators: Site Moderators, FAHC Science Team

jrweiss
Posts: 704
Joined: Tue Dec 04, 2007 6:56 am
Hardware configuration: Ryzen 7 5700G, 22.40.46 VGA driver; 32GB G-Skill Trident DDR4-3200; Samsung 860EVO 1TB Boot SSD; VelociRaptor 1TB; MSI GTX 1050ti, 551.23 studio driver; BeQuiet FM 550 PSU; Lian Li PC-9F; Win11Pro-64, F@H 8.3.5.

[Suspended] Ryzen 7 3700X, MSI X570MPG, 32GB G-Skill Trident Z DDR4-3600; Corsair MP600 M.2 PCIe Gen4 Boot, Samsung 840EVO-250 SSDs; VelociRaptor 1TB, Raptor 150; MSI GTX 1050ti, 526.98 driver; Kingwin Stryker 500 PSU; Lian Li PC-K7B. Win10Pro-64, F@H 8.3.5.
Location: @Home
Contact:

Re: Why deleting a work unit is bad

Post by jrweiss »

I lose an undue number of GPU WUs "just because." My machine will run for days without a problem, then occasionally the GPU client will EUE multiple times due to an "unstable machine." When I restart the client and get a new WU, it continues on for several more days.

The biggest problems are when the machine is unattended, since I travel a lot. It will be a welcome change when the new GPU client better handles errors and stops blaming "unstable machine" for all EUEs...
Ryzen 7 5700G, 22.40.46 VGA driver; MSI GTX 1050ti, 551.23 studio driver
Ryzen 7 3700X; MSI GTX 1050ti, 551.23 studio driver [Suspended]
uncle fuzzy
Posts: 460
Joined: Sun Dec 02, 2007 10:15 pm
Location: Michigan

Re: Why deleting a work unit is bad

Post by uncle fuzzy »

Qinsp wrote:
7im wrote:
theteofscuba wrote:...

i've had to abandon some wus in the past every now and then when configuring and reconfiguring f@h clients.
Most times the client can simply be udpated, or copied and moved, and not touch the WU. Under what conditions does reconfiguring a fah client require you to abandon a WU?
If you are running XP, and attempt to change the configuration, you can lose the job. Win7 seems to be stable. XP is broken. Any change can (and often is) lethal because the CONFIG code itself is bad. It's not the job, it's that you cannot restart the client without removing everything first. The only way I've been able to restart is by wiping out the whole install and reinstalling.
I have never had any problems stopping a client and running configuration mid-WU with XP. Editing the config file itself is always dangerous, but properly running a console with -config or -configonly works perfectly- CPU, GPU, or SMP. I don't use the systray clients, but I don't recall hearing any issues with configuration.
Proud to crash my machines as a Beta Tester!

Image
Leonardo
Posts: 260
Joined: Tue Dec 04, 2007 5:09 am
Hardware configuration: GPU slots on home-built, purpose-built PCs.
Location: Eagle River, Alaska

Re: Why deleting a work unit is bad

Post by Leonardo »

There were times when I was running WinXP platforms and GPU2 clients (Win7 and GPU3 now) when making changes to the config file on the fly would destroy a work unit. I eventually learned to shut down the client, make the configuration change, and to then restart the client. I did not lose work units that way.
Image
sswilson
Posts: 90
Joined: Mon Dec 17, 2007 12:34 am
Hardware configuration: ASUS Crosshair IV Formula / AMD 1090T / 4X2 Gig GSkill Pi PC3-12800 / Corsair TX750W PSU / Sparkle GTX275 Plus / CoolerMaster Cosmos S / MCP655 WC Pump / MCR320 Rad / 6X Yate Loons / PA120.1 / 2X Scythe Ultra Kaze / Enzotech Luna WB / Dell Ultrasharp 2209WA

Gigabyte P35-DQ6 / Q6600 / 2X 1G 1066 Firestix / "Baked" XFX GTX 280 (RIP again :( ) / MSI GTS 450 Cyclone OC /PC P&C 750W Silencer / MCR220-QP-Res / DD DDCPX-Pro / Apogee GT / Highspeed PC Tech Station / Samsung 931BF / BenQ Q9T4
Location: Moncton, New Brunswick, Canada

Re: Why deleting a work unit is bad

Post by sswilson »

bruce wrote:Nobody disputes that an occasional WU will be lost, but let's be realistic. What percentage of the WUs that you've been assigned were lost because of a hardware swap where you forgot to run -oneunit?

An allowable loss rate of 20% of the SMP WUs is HUGE compared to the number of WUs that are actually lost -- unless someone is intentionally cherrypicking.
You won't get an argument from me on the 20%, I doubt my lost WUs amount to even 1% let alone 20. My comments WRT a -release switch were directed more towards doing better science by letting the system know as early as possible that it needs to reassign a WU.

While we're on the subject though.... any changes to the 20% grace for lost units should be done on a sliding scale. IMO, 20% is a reasonable figure to use when a folder is first starting out. If we were to use something like 2%, any small glitch could quickly put that folder into a hole that would be very difficult to dig out of. The last thing we want to do is something that will discourage new folders. (Even the current 8 complete units has some new folders gnashing their teeth while they wait a few days for the bonus points to show up... ;) ). A scale based on total WUs submitted (i.e. 20% for <25 WUs, 10% for <50 WUs, 5% for <75 WUs, etc....) would have the effect of discouraging long time folders from cherry picking, while not putting new folders at an unfair disadvantage.

edit: It should also be noted that the recent state of WUs being made available has made cherry picking mostly moot..... doesn't do any good to attempt cherry picking if the only WUs available for days on end are 6701/6702s....
Qinsp
Posts: 216
Joined: Sun Oct 17, 2010 2:34 pm

Re: Why deleting a work unit is bad

Post by Qinsp »

bruce wrote:Nobody disputes that an occasional WU will be lost, but let's be realistic. What percentage of the WUs that you've been assigned were lost because of a hardware swap where you forgot to run -oneunit?

An allowable loss rate of 20% of the SMP WUs is HUGE compared to the number of WUs that are actually lost -- unless someone is intentionally cherrypicking.

The allowable lost rate is not 20% if you get a true-blue known bad WU. It crashes 5 times, so the real lost rate is 4% if my math is right. Only "user problem" rate is 20%.
Quality Inspection - Corona, CA, USA
Dimensional Inspection Laboratory
Pat McSwain, President
7im
Posts: 10179
Joined: Thu Nov 29, 2007 4:30 pm
Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
Location: Arizona
Contact:

Re: Why deleting a work unit is bad

Post by 7im »

Qinsp wrote:
7im wrote:
theteofscuba wrote:...

i've had to abandon some wus in the past every now and then when configuring and reconfiguring f@h clients.

Most times the client can simply be udpated, or copied and moved, and not touch the WU. Under what conditions does reconfiguring a fah client require you to abandon a WU?
If you are running XP, and attempt to change the configuration, you can lose the job. Win7 seems to be stable. XP is broken. Any change can (and often is) lethal because the CONFIG code itself is bad. It's not the job, it's that you cannot restart the client without removing everything first. The only way I've been able to restart is by wiping out the whole install and reinstalling.

When the CONFIG crashes, it won't let you make anything but a blank config file, and that won't finish the job.

Yes, if there were detailed instructions on the work-arounds, you could save the job. But most the "fixes" posted on the forum do not work. It has nothing to do with renaming the executable, it crashes either way. It does not affect 100% of XP machines apparently, but on this one I'm typing from, it will crash 100% of the time like clockwork. Config == RIP WU.

Only once have I had to completely remove a client to get it running again, and I've been folding since 2003. Deleting a config file, or deleting the WU is about as drastic as I have to go. I can't remember the last time a v6 client crashed on me either. And while SMP used to lose WUs regularly 4 years ago, but not recently.

I don't know what you are doing that is so very dangerous to your computer that so many WUs or Configs get corrupted, but I can't remember the last time one of my config files died. Your client issues are the exception to the rule, not the common place occurance. Please take more care.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Eugenitor
Posts: 5
Joined: Mon Oct 04, 2010 5:31 pm

Re: Why deleting a work unit is bad

Post by Eugenitor »

If the client keeps giving you unstable-machine errors, there's a distinct chance that you might actually have an unstable machine, perhaps a faulty GPU or RAM.

For people stopping in the middle of WUs to change configuration, there's a "Pause when done" option; you can change things between WUs instead.
jrweiss
Posts: 704
Joined: Tue Dec 04, 2007 6:56 am
Hardware configuration: Ryzen 7 5700G, 22.40.46 VGA driver; 32GB G-Skill Trident DDR4-3200; Samsung 860EVO 1TB Boot SSD; VelociRaptor 1TB; MSI GTX 1050ti, 551.23 studio driver; BeQuiet FM 550 PSU; Lian Li PC-9F; Win11Pro-64, F@H 8.3.5.

[Suspended] Ryzen 7 3700X, MSI X570MPG, 32GB G-Skill Trident Z DDR4-3600; Corsair MP600 M.2 PCIe Gen4 Boot, Samsung 840EVO-250 SSDs; VelociRaptor 1TB, Raptor 150; MSI GTX 1050ti, 526.98 driver; Kingwin Stryker 500 PSU; Lian Li PC-K7B. Win10Pro-64, F@H 8.3.5.
Location: @Home
Contact:

Re: Why deleting a work unit is bad

Post by jrweiss »

If the machine is unstable, why does it Fold SMP+GPU for 2 weeks before it gets to its "unstable" phase?
Ryzen 7 5700G, 22.40.46 VGA driver; MSI GTX 1050ti, 551.23 studio driver
Ryzen 7 3700X; MSI GTX 1050ti, 551.23 studio driver [Suspended]
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: Why deleting a work unit is bad

Post by Grandpa_01 »

Because it is not 100% stable :ewink:
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
jrweiss
Posts: 704
Joined: Tue Dec 04, 2007 6:56 am
Hardware configuration: Ryzen 7 5700G, 22.40.46 VGA driver; 32GB G-Skill Trident DDR4-3200; Samsung 860EVO 1TB Boot SSD; VelociRaptor 1TB; MSI GTX 1050ti, 551.23 studio driver; BeQuiet FM 550 PSU; Lian Li PC-9F; Win11Pro-64, F@H 8.3.5.

[Suspended] Ryzen 7 3700X, MSI X570MPG, 32GB G-Skill Trident Z DDR4-3600; Corsair MP600 M.2 PCIe Gen4 Boot, Samsung 840EVO-250 SSDs; VelociRaptor 1TB, Raptor 150; MSI GTX 1050ti, 526.98 driver; Kingwin Stryker 500 PSU; Lian Li PC-K7B. Win10Pro-64, F@H 8.3.5.
Location: @Home
Contact:

Re: Why deleting a work unit is bad

Post by jrweiss »

Hmmm... If it's stable except for a few F@H WUs, maybe it's the WUs that are unstable...
Ryzen 7 5700G, 22.40.46 VGA driver; MSI GTX 1050ti, 551.23 studio driver
Ryzen 7 3700X; MSI GTX 1050ti, 551.23 studio driver [Suspended]
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: Why deleting a work unit is bad

Post by Grandpa_01 »

jrweiss wrote:Hmmm... If it's stable except for a few F@H WUs, maybe it's the WUs that are unstable...
Drop the OC back to default settings and run it for a while if it does not loose any WU's at default then you have your answer. My bet would be not stable. :wink:
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
Qinsp
Posts: 216
Joined: Sun Oct 17, 2010 2:34 pm

Re: Why deleting a work unit is bad

Post by Qinsp »

7im wrote:...

I don't know what you are doing that is so very dangerous to your computer that so many WUs or Configs get corrupted, but I can't remember the last time one of my config files died. Your client issues are the exception to the rule, not the common place occurance. Please take more care.
Well, you were right. I should have listened. I feel so guilty. :(

I should have seen the signs, since I've been working on computers since the CP/M days, but I suppose I was in denial. It was good computer when it was young, but as it got older, odd things started to happen. Things started to disappear off my desk. I would often find letter openers, scissors, or even razor blades on my chair. I would look at the History in my web-browser and it had listings for Satanic websites. My Music Folder was filled with heavy metal rock and vulgar rap songs.

I blamed it on the employees. If only I had listened to you. :(

This morning there was a dead hooker in the alley behind our shop. Next to the body was a CMOS battery...

The computer wouldn't boot. I opened the case, and the CMOS battery was missing. :(

I don't know what to do. I raised that computer from a chip, and I would feel like I betrayed it if I send it to Warranty. I put the battery back in and it still works great. But it still won't let me config FAH without a fight.
Quality Inspection - Corona, CA, USA
Dimensional Inspection Laboratory
Pat McSwain, President
mdk777
Posts: 480
Joined: Fri Dec 21, 2007 4:12 am

Re: Why deleting a work unit is bad

Post by mdk777 »

It was good computer when it was young, but as it got older, odd things started to happen.
I don't know what to do. I raised that computer from a chip
Were you careful to have it imprint on its peers?
It may be confused of its identity, causing it to challenge you for leadership of the herd.

Classic mistake that needs to be avoided with wolves and dairy bulls.
http://www.grandin.com/behaviour/princi ... dents.html

You may have to cull it. Once they decide to go after you, it is the only safe choice. :wink:
Transparency and Accountability, the necessary foundation of any great endeavor!
jrweiss
Posts: 704
Joined: Tue Dec 04, 2007 6:56 am
Hardware configuration: Ryzen 7 5700G, 22.40.46 VGA driver; 32GB G-Skill Trident DDR4-3200; Samsung 860EVO 1TB Boot SSD; VelociRaptor 1TB; MSI GTX 1050ti, 551.23 studio driver; BeQuiet FM 550 PSU; Lian Li PC-9F; Win11Pro-64, F@H 8.3.5.

[Suspended] Ryzen 7 3700X, MSI X570MPG, 32GB G-Skill Trident Z DDR4-3600; Corsair MP600 M.2 PCIe Gen4 Boot, Samsung 840EVO-250 SSDs; VelociRaptor 1TB, Raptor 150; MSI GTX 1050ti, 526.98 driver; Kingwin Stryker 500 PSU; Lian Li PC-K7B. Win10Pro-64, F@H 8.3.5.
Location: @Home
Contact:

Re: Why deleting a work unit is bad

Post by jrweiss »

Grandpa_01 wrote:Drop the OC back to default settings and run it for a while if it does not loose any WU's at default then you have your answer. My bet would be not stable. :wink:
I don't OC.
Ryzen 7 5700G, 22.40.46 VGA driver; MSI GTX 1050ti, 551.23 studio driver
Ryzen 7 3700X; MSI GTX 1050ti, 551.23 studio driver [Suspended]
Grandpa_01
Posts: 1122
Joined: Wed Mar 04, 2009 7:36 am
Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M

Re: Why deleting a work unit is bad

Post by Grandpa_01 »

Have you tested the memory
Image
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
Post Reply