Why deleting a work unit is bad
Moderators: Site Moderators, FAHC Science Team
Why deleting a work unit is bad
I know it's bad to delete a work unit in progress. However, I failed to see evidence about the reasons why it's bad... All I know that work units is created in a sequential fashion and deleting one will cause a delay for the next ones.
Is there are links from the PG commenting on them ?
Is there are links from the PG commenting on them ?
-
- Site Moderator
- Posts: 6986
- Joined: Wed Dec 23, 2009 9:33 am
- Hardware configuration: V7.6.21 -> Multi-purpose 24/7
Windows 10 64-bit
CPU:2/3/4/6 -> Intel i7-6700K
GPU:1 -> Nvidia GTX 1080 Ti
§
Retired:
2x Nvidia GTX 1070
Nvidia GTX 675M
Nvidia GTX 660 Ti
Nvidia GTX 650 SC
Nvidia GTX 260 896 MB SOC
Nvidia 9600GT 1 GB OC
Nvidia 9500M GS
Nvidia 8800GTS 320 MB
Intel Core i7-860
Intel Core i7-3840QM
Intel i3-3240
Intel Core 2 Duo E8200
Intel Core 2 Duo E6550
Intel Core 2 Duo T8300
Intel Pentium E5500
Intel Pentium E5400 - Location: Land Of The Long White Cloud
- Contact:
Re: Why deleting a work unit is bad
The problem arises when you dump the WU, there isn't any way for the Servers to know if you have dumped it or not. They wait for the Preferred Deadline then reassign. Now, for SMP2 WUs, the Preferred Deadline is short (3 to ~4) Days and the Bonus system encourages faster WU return and penalizes for dumping WU (>=80% Return rate). GPU WUs have a variable deadline and the Classic WUs have the longest. So the most severe impact of dumping the WUs will be on the Classic Client's research and the least on SMP2 WUs while the GPU2 WUs will be in-between them (of course, this is oversimplified and the actual picture comes from the Preferred Deadlines that each Project has).
Currently, there are plans to introduce the Bonus Points to the Classic Client which will discourage WU dumping. However, I am not aware of the details.
A quick search revels this:
{PG Member} DanEnsign -> viewtopic.php?p=10334#p10334
{Site Admin} bruce -> viewtopic.php?p=134170#p134170
Do note that I am against intentional dumping of any WUs. However, accidents can happen and it is "okay" as long as you learn from your mistakes and avoid repeating them.
Currently, there are plans to introduce the Bonus Points to the Classic Client which will discourage WU dumping. However, I am not aware of the details.
A quick search revels this:
{PG Member} DanEnsign -> viewtopic.php?p=10334#p10334
{Site Admin} bruce -> viewtopic.php?p=134170#p134170
Do note that I am against intentional dumping of any WUs. However, accidents can happen and it is "okay" as long as you learn from your mistakes and avoid repeating them.
ETA:
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Now ↞ Very Soon ↔ Soon ↔ Soon-ish ↔ Not Soon ↠ End Of Time
Welcome To The F@H Support Forum Ӂ Troubleshooting Bad WUs Ӂ Troubleshooting Server Connectivity Issues
Re: Why deleting a work unit is bad
I disagree that the least severe impact is on SMP2. The shortest time impact is with SMP2, but then again SMP2 is much more time-critical. If the deadlines for each project are related to their time criticality, then the overall impact on each project is equally severe.
-
- Posts: 10179
- Joined: Thu Nov 29, 2007 4:30 pm
- Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
- Location: Arizona
- Contact:
Re: Why deleting a work unit is bad
You need a comment from PG saying that dumping WUs is bad? One would think the obvious logic of reality would suffice. Dumping a WU slows things down while we wait for that WU to expire and get reassigned. It can result in a delay of several months in the larger CPU WUs.
But when logic escapes those few, point them at the 80% completion rate requirement of the QRB program. PG has added points incentives to NOT dump WUs, and that's the strongest message of all.
But when logic escapes those few, point them at the 80% completion rate requirement of the QRB program. PG has added points incentives to NOT dump WUs, and that's the strongest message of all.
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Re: Why deleting a work unit is bad
For me, the logic will suffice but for some people, if no statement from the PG is made about that officially, it mean it's not bad7im wrote:You need a comment from PG saying that dumping WUs is bad? One would think the obvious logic of reality would suffice. Dumping a WU slows things down while we wait for that WU to expire and get reassigned. It can result in a delay of several months in the larger CPU WUs.
But when logic escapes those few, point them at the 80% completion rate requirement of the QRB program. PG has added points incentives to NOT dump WUs, and that's the strongest message of all.
Anyway, there will be one official statement soon
-
- Pande Group Member
- Posts: 2058
- Joined: Fri Nov 30, 2007 6:25 am
- Location: Stanford
Re: Why deleting a work unit is bad
If one needs some official statement here, I am happy to make it. As mentioned above, dumping WUs means that the WU will sit, unproductive, until it times out. This slows down the science since we often need to wait for most RUN/CLONE combinations to get to a certain point. It is because dumping WUs are so bad that we have instated the QRB.Xilikon wrote:For me, the logic will suffice but for some people, if no statement from the PG is made about that officially, it mean it's not bad7im wrote:You need a comment from PG saying that dumping WUs is bad? One would think the obvious logic of reality would suffice. Dumping a WU slows things down while we wait for that WU to expire and get reassigned. It can result in a delay of several months in the larger CPU WUs.
But when logic escapes those few, point them at the 80% completion rate requirement of the QRB program. PG has added points incentives to NOT dump WUs, and that's the strongest message of all.
Anyway, there will be one official statement soon
Prof. Vijay Pande, PhD
Departments of Chemistry, Structural Biology, and Computer Science
Chair, Biophysics
Director, Folding@home Distributed Computing Project
Stanford University
Departments of Chemistry, Structural Biology, and Computer Science
Chair, Biophysics
Director, Folding@home Distributed Computing Project
Stanford University
-
- Posts: 96
- Joined: Wed Dec 05, 2007 7:15 am
- Hardware configuration: PS3, Phenom II X4, QX9775, HD 8570
- Contact:
Re: Why deleting a work unit is bad
if they had some extra cash to spend on development i'd suggest a way to cancel a work unit and notify the server (maybe even upload a partially completed wu) .. even though it isn't something that is needed often. i've had to abandon some wus in the past every now and then when configuring and reconfiguring f@h clients. but then again, it is not needed unless work units become somewhat scarce. and if the users are nice enough to tell it to cancel.
emphasis on "if they had some extra cash"
emphasis on "if they had some extra cash"
Carnivorous Labs
http://garden-experiment.blogspot.com/
http://garden-experiment.blogspot.com/
Re: Why deleting a work unit is bad
This feature has been requested before. Maybe it will be in v7, but we'll have to wait and see.
I'm not sure it would be as valuable as you think, though. If the WU is already about to expire, the server would treat the notification as an early notification of that expiration so the benefit would be small since the WU has already been delayed by some amount.
Also, if it's easy to dump WUs, some folks will use it for the wrong reasons. Along with such a feaure I'd recommend that the PG look hard at appropriate penalties for over-use of this feature. Judging only by my own performance, I feel the 80% bonus requirement is entirely too lax. I don't keep a systematic count, but I'll bet that less than 1% of the WUs assigned to me expire or would need to be dumped.
Of course there is a small percentage of my WUs which do have EUE errors. Of those, some are reported to the server and some are deleted by the client. I believe that the client should be smart enough to report all EUEs to the server. What I do not know is which count against the 80% bonus requirement and which do not and how that might change if there are more reports.
I'm not sure it would be as valuable as you think, though. If the WU is already about to expire, the server would treat the notification as an early notification of that expiration so the benefit would be small since the WU has already been delayed by some amount.
Also, if it's easy to dump WUs, some folks will use it for the wrong reasons. Along with such a feaure I'd recommend that the PG look hard at appropriate penalties for over-use of this feature. Judging only by my own performance, I feel the 80% bonus requirement is entirely too lax. I don't keep a systematic count, but I'll bet that less than 1% of the WUs assigned to me expire or would need to be dumped.
Of course there is a small percentage of my WUs which do have EUE errors. Of those, some are reported to the server and some are deleted by the client. I believe that the client should be smart enough to report all EUEs to the server. What I do not know is which count against the 80% bonus requirement and which do not and how that might change if there are more reports.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.
Re: Why deleting a work unit is bad
I agree that 80% sounds lax on its face, however I can also say I was thankful it WAS that lax just recently.
I tried to run smp (n - 1) on a dual xeon box recently. Might have been smp -15 on a dual nehalem system if I recall. The end result was approximately 150 instant EUEs over the course of a few hours while that box was unattended. I don't know what my exact WU completion percentage is, but I was glad it is high enough to withstand 150 simultaneous EUEs while staying above 80%.
I tried to run smp (n - 1) on a dual xeon box recently. Might have been smp -15 on a dual nehalem system if I recall. The end result was approximately 150 instant EUEs over the course of a few hours while that box was unattended. I don't know what my exact WU completion percentage is, but I was glad it is high enough to withstand 150 simultaneous EUEs while staying above 80%.
-
- Posts: 10179
- Joined: Thu Nov 29, 2007 4:30 pm
- Hardware configuration: Intel i7-4770K @ 4.5 GHz, 16 GB DDR3-2133 Corsair Vengence (black/red), EVGA GTX 760 @ 1200 MHz, on an Asus Maximus VI Hero MB (black/red), in a blacked out Antec P280 Tower, with a Xigmatek Night Hawk (black) HSF, Seasonic 760w Platinum (black case, sleeves, wires), 4 SilenX 120mm Case fans with silicon fan gaskets and silicon mounts (all black), a 512GB Samsung SSD (black), and a 2TB Black Western Digital HD (silver/black).
- Location: Arizona
- Contact:
Re: Why deleting a work unit is bad
theteofscuba wrote:...
i've had to abandon some wus in the past every now and then when configuring and reconfiguring f@h clients.
Most times the client can simply be udpated, or copied and moved, and not touch the WU. Under what conditions does reconfiguring a fah client require you to abandon a WU?
How to provide enough information to get helpful support
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Tell me and I forget. Teach me and I remember. Involve me and I learn.
-
- Posts: 1122
- Joined: Wed Mar 04, 2009 7:36 am
- Hardware configuration: 3 - Supermicro H8QGi-F AMD MC 6174=144 cores 2.5Ghz, 96GB G.Skill DDR3 1333Mhz Ubuntu 10.10
2 - Asus P6X58D-E i7 980X 4.4Ghz 6GB DDR3 2000 A-Data 64GB SSD Ubuntu 10.10
1 - Asus Rampage Gene III 17 970 4.3Ghz DDR3 2000 2-500GB Segate 7200.11 0-Raid Ubuntu 10.10
1 - Asus G73JH Laptop i7 740QM 1.86Ghz ATI 5870M
Re: Why deleting a work unit is bad
I also agree that the 80% is too lax but there was one time when 80% was not enough due to a bad series of WU's that would EUE instantly and report back to the server, which took me below the 80% mark. Even with that happening I still think it should be raised to 90% or 95%. If anybody should have a problem with the 80% completion rate it should be me all 4 of my rigs run at 4.2+ GHz and I am continually playing with the OC on them which most of us know can cause stability problems and lost WU's. With that said I would venture to say my completion rate remains 99%+ so I would say that the majority of people who drop below 98% either have a bad OC bad hardware or are cherry picking. I have always believed that 80% was way too generous and that Stanford should take another look at it.
The smp and bigadv WU's are actually very stable although some say the 6701's are problematic which I do not believe. I will say they are slow worthless pieces of crap and it irritates me to fold them (Rant) and they do require a very stable OC, I still cannot say I have lost a single one that was not my fault, and there is really no ligament reason to delete one or any other WU, other than if the WU fails more than once at the same spot then it might be a bad WU. If a person believes they have a bad WU it should be reported to this forum.
The smp and bigadv WU's are actually very stable although some say the 6701's are problematic which I do not believe. I will say they are slow worthless pieces of crap and it irritates me to fold them (Rant) and they do require a very stable OC, I still cannot say I have lost a single one that was not my fault, and there is really no ligament reason to delete one or any other WU, other than if the WU fails more than once at the same spot then it might be a bad WU. If a person believes they have a bad WU it should be reported to this forum.
2 - SM H8QGi-F AMD 6xxx=112 cores @ 3.2 & 3.9Ghz
5 - SM X9QRI-f+ Intel 4650 = 320 cores @ 3.15Ghz
2 - I7 980X 4.4Ghz 2-GTX680
1 - 2700k 4.4Ghz GTX680
Total = 464 cores folding
Re: Why deleting a work unit is bad
If you are running XP, and attempt to change the configuration, you can lose the job. Win7 seems to be stable. XP is broken. Any change can (and often is) lethal because the CONFIG code itself is bad. It's not the job, it's that you cannot restart the client without removing everything first. The only way I've been able to restart is by wiping out the whole install and reinstalling.7im wrote:theteofscuba wrote:...
i've had to abandon some wus in the past every now and then when configuring and reconfiguring f@h clients.
Most times the client can simply be udpated, or copied and moved, and not touch the WU. Under what conditions does reconfiguring a fah client require you to abandon a WU?
When the CONFIG crashes, it won't let you make anything but a blank config file, and that won't finish the job.
Yes, if there were detailed instructions on the work-arounds, you could save the job. But most the "fixes" posted on the forum do not work. It has nothing to do with renaming the executable, it crashes either way. It does not affect 100% of XP machines apparently, but on this one I'm typing from, it will crash 100% of the time like clockwork. Config == RIP WU.
Quality Inspection - Corona, CA, USA
Dimensional Inspection Laboratory
Pat McSwain, President
Dimensional Inspection Laboratory
Pat McSwain, President
-
- Posts: 90
- Joined: Mon Dec 17, 2007 12:34 am
- Hardware configuration: ASUS Crosshair IV Formula / AMD 1090T / 4X2 Gig GSkill Pi PC3-12800 / Corsair TX750W PSU / Sparkle GTX275 Plus / CoolerMaster Cosmos S / MCP655 WC Pump / MCR320 Rad / 6X Yate Loons / PA120.1 / 2X Scythe Ultra Kaze / Enzotech Luna WB / Dell Ultrasharp 2209WA
Gigabyte P35-DQ6 / Q6600 / 2X 1G 1066 Firestix / "Baked" XFX GTX 280 (RIP again :( ) / MSI GTS 450 Cyclone OC /PC P&C 750W Silencer / MCR220-QP-Res / DD DDCPX-Pro / Apogee GT / Highspeed PC Tech Station / Samsung 931BF / BenQ Q9T4 - Location: Moncton, New Brunswick, Canada
Re: Why deleting a work unit is bad
It doesn't happen often, but hardware swaps (sometimes we forget to run a -oneunit, and need the hardware right away... ), OS reloads, and/or extended reworking of hardware setups (WC setup / cleaning can sometimes run longer than a day...).7im wrote:theteofscuba wrote:...
i've had to abandon some wus in the past every now and then when configuring and reconfiguring f@h clients.
Most times the client can simply be udpated, or copied and moved, and not touch the WU. Under what conditions does reconfiguring a fah client require you to abandon a WU?
I run 3 dedicated rigs (GPU + SMP2) 24/7, and do my best to complete all WUs accepted by my machines, but probably have 10 or so WUs over the space of a year which end up in the garbage bin for whatever reason. Having a (monitored to prevent abuse) -release switch or something similar would benefit the work timeline in those cases.
Re: Why deleting a work unit is bad
Nobody disputes that an occasional WU will be lost, but let's be realistic. What percentage of the WUs that you've been assigned were lost because of a hardware swap where you forgot to run -oneunit?
An allowable loss rate of 20% of the SMP WUs is HUGE compared to the number of WUs that are actually lost -- unless someone is intentionally cherrypicking.
An allowable loss rate of 20% of the SMP WUs is HUGE compared to the number of WUs that are actually lost -- unless someone is intentionally cherrypicking.
Posting FAH's log:
How to provide enough info to get helpful support.
How to provide enough info to get helpful support.