Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 96 · 97 · 98 · 99 · 100 · 101 · 102 . . . 300 · Next
Author | Message |
---|---|
Falconet Send message Joined: 9 Mar 09 Posts: 353 Credit: 1,222,776 RAC: 4,349 |
I've noticed than some of the latest Tasks aren't checkpointing properly, so if you interrupt them they will revert back to the last successful checkpoint. The watchdog isn't at 10 hours. It's 10 hours AFTER whatever the CPU runtime setting is at. So, if you are running with the default setting, which is 8 CPU hours, then the watchdog will only kick in at 18 hours. What Grant meant is that considering the watchdog should kick in at 18 hours, if the task is still running at 20 hours, you might want to abort it. |
mrhastyrib Send message Joined: 18 Feb 21 Posts: 90 Credit: 2,541,890 RAC: 0 |
No idea what you think I've changed I know. It's that damned Dunning-Kruger thingy. |
jsm Send message Joined: 4 Apr 20 Posts: 3 Credit: 76,283,296 RAC: 66,499 |
Running at 22 hours has substantially reduced the bandwidth hog but detailed checking has turned up a query. All the computers are asking the scheduler every minute or so for new tasks to be told 'no can do you have plenty' (I paraphrase). This is clearly putting an unnecessary load on the scheduler and contributing to my bandwidth loss. Is there a way to instruct the preferences only to seek additional work every so often eg 1 hour? capt |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1671 Credit: 17,529,908 RAC: 22,862 |
Running at 22 hours has substantially reduced the bandwidth hog but detailed checking has turned up a query. All the computers are asking the scheduler every minute or so for new tasks to be told 'no can do you have plenty' (I paraphrase). This is clearly putting an unnecessary load on the scheduler and contributing to my bandwidth loss. Is there a way to instruct the preferences only to seek additional work every so often eg 1 hour?How often it asks for work depends on the number of cores/threads you have, the amount of time the system is actually able to process work, and most importantly- on your cache settings. The fact that many of your Tasks time out before you even return them due to missed deadlines indicates your cache setting is way, way, way, way too large. The estimated completion time for all Tasks, regardless of how long your CPU Target time is set to is 8 hours. So having a multi-day cache, combined with a longer than the default 8 hour Target CPU time is going to result in endless requests for work, and huge numbers of Tasks missing their deadlines. In your computing preferences, Other Store at least 0.01 days of work Store up to an additional 0.01 days of workAnd they will stop trashing Work Units due to missed deadlines, and stop continually asking for more work. If you go back to the default 8 hours in the future, you could then bump up the "Store at least 0.01 days of work" to something like 0.2 to maintain a reasonable buffer, that won't result in missed deadlines when things change. Grant Darwin NT |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,716,372 RAC: 18,198 |
No context, no conversation.No idea what you think I've changedI know. It's that damned Dunning-Kruger thingy. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,716,372 RAC: 18,198 |
The 6.5GB problem goes away on an 8GB machine if you set it to use 100% memory. It never actually uses 100% since everything overestimates. I just changed my old Boinc-only machines [1] and Rosettas downloaded and ran. [1] Who has 8GB on a machine they actually interact with? You could maybe load Windows 10 and 1 application. But dare to play a game, or use email and a photo editor at once and it'll grind to a halt. Another example of modern shoddy lazy bloated programming. I can boot Linux off a 1GB flash drive. Yet Windows is 20 times bigger. |
mrhastyrib Send message Joined: 18 Feb 21 Posts: 90 Credit: 2,541,890 RAC: 0 |
No context, no conversation.No idea what you think I've changedI know. It's that damned Dunning-Kruger thingy. Unless you are a relative -- which you are not -- it's not my duty to compensate for your inability to keep up with a conversation due to age-related infirmities. I counsel making use of Google. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,118,186 RAC: 5,220 |
The 6.5GB problem goes away on an 8GB machine if you set it to use 100% memory. It never actually uses 100% since everything overestimates. I just changed my old Boinc-only machines [1] and Rosettas downloaded and ran. Windows10 runs just fine with 8gb of ram, even on a laptop, and can even crunch Boinc projects quite well if you have the right processor and choose your projects wisely. Playing games is a whole other story though and you are correct unless you are playing a non competitive game like MineCraft or the sort. The size of the Windows OS is what it is it's not like it can be changed by any of us so you just learn to deal with what you have to deal with or you change to something else. |
MarkJ Send message Joined: 28 Mar 20 Posts: 72 Credit: 25,238,680 RAC: 0 |
Over the course of this afternoon I’ve had 6 segv errors, all on files starting miniprotien. Its not just you. I've got 29 that failed across a number of machines. They are all miniprotein_relax8 series that have died after running for an hour. BOINC blog |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,716,372 RAC: 18,198 |
My Aunt doesn't play games. She finds 4GB (Hewlett Packard actually sold her a laptop with such a stupidly pitiful amount, which could not be upgraded!) unusable, and 8GB ok if she only runs one program at a time, 12GB was needed just to use email and a photo editor. If I make a computer for someone it has 16Gb, or 32GB for games or anything else demanding. I put 64GB in my own. Programmers don't code as neatly as they used to!The 6.5GB problem goes away on an 8GB machine if you set it to use 100% memory. It never actually uses 100% since everything overestimates. I just changed my old Boinc-only machines [1] and Rosettas downloaded and ran. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,716,372 RAC: 18,198 |
You seem confused. "Context" in this context (titter) means that you failed to quote enough text so I knew what the conversation was about. It has nothing to do with the hypothetical Dunning-Kruger bullshit. Virtually nobody can remember every single conversation they have, I'm probably in 200 of them.No context, no conversation.No idea what you think I've changedI know. It's that damned Dunning-Kruger thingy. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,716,372 RAC: 18,198 |
Same here, and on prehelical (although I didn't check the error type).Over the course of this afternoon I’ve had 6 segv errors, all on files starting miniprotien. |
CIA Send message Joined: 3 May 07 Posts: 100 Credit: 21,059,812 RAC: 0 |
Same here, and on prehelical (although I didn't check the error type).Over the course of this afternoon I’ve had 6 segv errors, all on files starting miniprotien. Pretty much all of my mini protein_relax8 units are seconds (meaning they failed on another machine before I got them), and almost all of them are completing but taking 18 hours to do so. They are creating very few decoys. Example: https://boinc.bakerlab.org/rosetta/result.php?resultid=1366333671 |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,716,372 RAC: 18,198 |
Have you changed the setting to allow 18 hours? Because all mine are sticking to the 8 hours. I'm getting 50% of the mini protein_relax8 completing in 8 hours, and the other 50% failing, usually taking 5 hours to do so.Same here, and on prehelical (although I didn't check the error type).Over the course of this afternoon I’ve had 6 segv errors, all on files starting miniprotien. |
CIA Send message Joined: 3 May 07 Posts: 100 Credit: 21,059,812 RAC: 0 |
Have you changed the setting to allow 18 hours? Because all mine are sticking to the 8 hours. I'm getting 50% of the mini protein_relax8 completing in 8 hours, and the other 50% failing, usually taking 5 hours to do so. During the latest drought I had this machine set to 36 hours, but Friday when it became clear the drought has ended I set it back to its normal default 8 hour runtime. So it's running for the standard 8hr and then 10 additional hours on top as others have mentioned before the auto-cutoff happens. All my other machines are set to 36 hours, and while none of them have completed any of these longer units, some of them are showing signs it will happen to them also. For example on one machine I have a miniprotein WU that is only 57% done 22 hours in. I have a feeling it's going to crunch for 46 hours (set time limit +10hr cutoff). /edit. Just to add a datapoint. While it's not conclusive, all the Miniprotein_relax8 units I'm getting that run long do "complete" and show as valid, even after going 10 hours over. Of these units that run over, many are "seconds" sent to me from other machines that failed to process the WU. My machine is running OSX and completes them fine (beyond running 10hrs over). All the failed machines are windows or linux based. That said, I know Macs make up a small percentage of computers on this project, so I might have just not gotten a resend from a Mac in my small sample. |
mrhastyrib Send message Joined: 18 Feb 21 Posts: 90 Credit: 2,541,890 RAC: 0 |
you failed to quote enough text so I knew what the conversation was about. There was enough for you to recognize that I was replying to you, but not enough for you to remember what we were talking about, from a conversation within the past 24 hours, even though you knew it was you. Got it. Just between us girls, isn't the real issue here the same as the one with "dood" and "@": you're immensely irritated at some features of my posting style. Including quoting only the essence of an exchange. I think Letterman said it best: "An old man in a bathrobe on his front porch, shaking his fist at passing cars." |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 389 Credit: 12,070,320 RAC: 12,300 |
Over the course of this afternoon I’ve had 6 segv errors, all on files starting miniprotien. Thanks, I was hoping it was the tasks rather than my hardware. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 3,846 |
Over the course of this afternoon I’ve had 6 segv errors, all on files starting miniprotien. I've had several miniprotein_relax8 tasks fail also, but only one of them failed after one hour. The rest ran for at least two hours before failing. All were reissued to someone else, and either failed for that someone else as well, or aren't yet finished for that someone else. I've thought of a possible reason why some tasks are set to ask for 6 GB of memory. Quite a bit more is loaded to produce a core dump if they fail, but isn't needed if they don't fail. Not the best idea, but possible. |
DizzyD Send message Joined: 23 Nov 20 Posts: 6 Credit: 1,438,330 RAC: 0 |
/edit. Just to add a datapoint. While it's not conclusive, all the Miniprotein_relax8 units I'm getting that run long do "complete" and show as valid, even after going 10 hours over. Of these units that run over, many are "seconds" sent to me from other machines that failed to process the WU. My machine is running OSX and completes them fine (beyond running 10hrs over). All the failed machines are windows or linux based. That said, I know Macs make up a small percentage of computers on this project, so I might have just not gotten a resend from a Mac in my small sample. I am also running on a Mac. The mini protein_relax8 units also do complete after ~18.7 hours and provide credit; however, the credit is in the "two-hundred" range for 67,000+ seconds of work. So, I've gone in and aborted all of the "ready to start" mini protein_relax8 units and now I have all pre-helical-bundles_round1_attempt1 queued up. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1671 Credit: 17,529,908 RAC: 22,862 |
My Aunt doesn't play games. She finds 4GB (Hewlett Packard actually sold her a laptop with such a stupidly pitiful amount, which could not be upgraded!) unusable, and 8GB ok if she only runs one program at a time, 12GB was needed just to use email and a photo editor.The issue is the photo editor. I know several people running Windows 10 systems with 4GB of RAM with no issues (i was one for quite some time myself). Of course if you use software that requires huge amounts of RAM to do the work it needs to do- such as photo editing- then you need a system with the appropriate amount of RAM. That has always been the case. It also helps (a massive amount) if you have a SSD and not a HDD. Grant Darwin NT |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org