Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 88 · 89 · 90 · 91 · 92 · 93 · 94 . . . 300 · Next
Author | Message |
---|---|
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,716,372 RAC: 18,198 |
Shouldn't it have already done that when the 2nd genuine one was posted?Duplicate post deleted.You'd think there'd be a delete button. Who designs these things? |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,716,372 RAC: 18,198 |
Same happens in real life with toilet paper because of the plandemic. Some people are selfish idiots.I have seen several periods of downtime where work units have not been deployed for days at a time. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,716,372 RAC: 18,198 |
No, because you did an "I know you are" variant saying I'd used @ when telling you not to use @. Do people seriously say "dude"? Anyway "prick" is a compliment, it means you have a big appendage. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,716,372 RAC: 18,198 |
That limit doesn't work, you just buy from more shops at once. Not that I do that with toilet paper, but I do a similar thing to buy more paracetamol (painkiller) than you're "allowed" by the government. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,716,372 RAC: 18,198 |
That means nothing. For example I might (manually or Boinc did it) download a load of work from another project when this one runs out. Now that has to be completed before it will get work from Rosetta again.A few days in and the impact of the mis-configured Work Units is becoming clearer. Looks like the amount of work being done has dropped by almost a third, and isn't showing any signs of recovering.In the past it has taken several days for In progress numbers to get back to their pre-work shortage numbers. And that's with out running out of work again only a few hours after new work started coming through (which occurred this time).Say hello to two less hosts after they finish their current tasks, @Rosetta. I don't know if I have the time that's required to provide the space that is needed.You’re not alone. Look at the recent results graphs – ‘tasks in progress’ has dropped by around 200,000 (a third)… |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 389 Credit: 12,070,320 RAC: 12,300 |
This problem with tasks erroring out with computation error is now getting serious. Up until now my attitude has been “it’s only a few seconds a task, no sweat” but because I’m running a very small cache with multiple projects it runs Rosetta on a one out, one in basis so it gets one, errors it and then has to wait ages before it uploads the result and asks for another. Now it’s gone to the next level because my last n tasks have all errored it is extending the back off period to many hours before it will allow another request and I’m almost to the point where Rosetta is no longer running on my main machine. Does the panel know (a) how long these errors will continue (b) how many good tasks I need to return to get back into Rosetta’s good books? |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,716,372 RAC: 18,198 |
This problem with tasks erroring out with computation error is now getting serious. I do the same as you with a small buffer, but all that happens is Boinc builds up a Rosetta debt, and you'll end up doing more of them when they fix it. |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
Bandwidth usage massively increased in MarchThis might be at least in part due to the current batch of work units suffering an unusually high failure rate, meaning you will be downloading a lot more tasks than normal in any given period. As an extreme example, your Threadripper has had over 300 failures in the last few days. As there’s no way to tell bad tasks from good before they’ve downloaded and started, there’s nothing we can do about it other than let them run their course (or stop running Rosetta until they’ve passed). In BOINC Manager you can set a limit on the amount of data transferred in a given period. It’s not very sophisticated and only works per machine, so when you’ve got several the best you can do is set an allowance for each one as a proportion of your total limit based on the number of tasks you expect it to run. (And if you do set a limit you then need to keep an eye out for it being reached, at which point even small results files for completed tasks won’t be uploaded.) Bad tasks aside, one way to reduce the overall amount of network traffic while performing the same amount of work is to increase the target run time for tasks in your project preferences. Even though a longer run time might increase the upload size needed for each task (due to the greater number of results), that is often far outweighed by the saving in download size (which is fixed for each task, however long it runs for). The credit per hour is more or less the same whatever target run time you choose. |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
Does the panel know (a) how long these errors will continue (b) how many good tasks I need to return to get back into Rosetta’s good books?(a) With 1.1 million jobs in the queue and a completion rate around 280,000 per day, I’d estimate at least 4 days… (b) At just shy of 500 max per day you still are in Rosetta’s good books, so number of tasks isn’t the issue. If it’s just backoff times you’re running in to, either that’s set by the server and there’s nothing you can do about it, or you can try to force a connection by selecting Update on the Projects page. |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 389 Credit: 12,070,320 RAC: 12,300 |
Does the panel know (a) how long these errors will continue (b) how many good tasks I need to return to get back into Rosetta’s good books?(a) With 1.1 million jobs in the queue and a completion rate around 280,000 per day, I’d estimate at least 4 days… The back off time appears to be set by the server and is near doubling with each computation error that I’m returning :-( |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 3,846 |
Shouldn't it have already done that when the 2nd genuine one was posted?Duplicate post deleted.You'd think there'd be a delete button. Who designs these things? Yes. But it's rather slow to happen. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 3,846 |
Bandwidth usage massively increased in MarchThis might be at least in part due to the current batch of work units suffering an unusually high failure rate, meaning you will be downloading a lot more tasks than normal in any given period. As an extreme example, your Threadripper has had over 300 failures in the last few days. As there’s no way to tell bad tasks from good before they’ve downloaded and started, there’s nothing we can do about it other than let them run their course (or stop running Rosetta until they’ve passed). [snip] If anyone can get them to look at the log files to see why the errors are occurring, that might help. For the errors on my computer, they should quickly notice that something with "6mers" in its name is missing from the input files. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1671 Credit: 17,529,908 RAC: 22,862 |
Has there been a significant project change which could be the cause of this increased usage or am I looking for another problem?Hard to say. In most cases the results returned to the Rosetta servers are around 200k-1MB. But they can be well over 1MB in some cases, depending on the type of Task being processed. I'd suggest disabling BOINC network access for a while & see what the average result file size being returned is. Edit- and as Brian mentioned, we have recently had a large batch of Tasks that error out quickly, and appear to still be moving through the system. I would also check to see if there has been a increase in Windows update traffic,, that is the only thing that causes regular spikes in my network bandwidth- also check your privacy settings as having these set loosly results in aa lot of data being sent back to Microsoft & other companies. Also the occasional youtube usage when i find some interesting videos can result in a huge spike in data usage. But Brian's suggestion of the errored Tasks is the most likely cause. This is unsustainable and I will either have to shell out for an expensive unlimited contract (because I have an Ultima connection at over 100mbps) or cut back on Rosetta work.I'm guessing you don't have any real options when it comes to ISP? 50GB limit for a 100Mb connection is insane IMHO. Higher speed plans here come with high data caps. 50GB is something you used to get on a basic 25Mb/s starter plan- these days even 12Mb/s plans have can have as much as 500MB data caps. 100Mb/s plans are 1TB caps or unlimited by default (of course we pay through the nose for those). Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1671 Credit: 17,529,908 RAC: 22,862 |
...how many good tasks I need to return to get back into Rosetta’s good books?It's not an issue. Rosetta is set up to allow for such problems. Both of your systems are still good for plenty of Tasks each day- 491 on one, 502 on the other. I can't remember the exact mechanism, but for example for each Tasks that Validates, your limit in increases by 2 (it's actually more than that- there were times at Seti where people were down to 1 Task per 24hrs. Once they started returning valid Tasks again, within a few hours (depending on how fast they were returning Valid work) their limits were back in the 100 & even thousands of Tasks per 24 hours). But it would be nice of the researchers would test their models a bit more before releasing them here. The odd error is OK, but when it's a case of the odd Task not being an error and all others erroring out it really is a bit silly. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1671 Credit: 17,529,908 RAC: 22,862 |
The back off time appears to be set by the server and is near doubling with each computation error that I’m returning :-(I haven't seen that occur myself (but most of my errors were returned while i wasn't here). Boinc Manager backoffs set by the Scheduler and usually only occur when there is a problem contacting the Scheduler. A successful Scheduler contact & it's rest to the default 30 seconds. Returning errors should only result in a reduction in the number of Tasks per 24 hours for that host. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1671 Credit: 17,529,908 RAC: 22,862 |
That means nothing. For example I might (manually or Boinc did it) download a load of work from another project when this one runs out. Now that has to be completed before it will get work from Rosetta again.It really is a shame you don't read all of what's posted before you feel the need to comment. Grant Darwin NT |
DizzyD Send message Joined: 23 Nov 20 Posts: 6 Credit: 1,438,330 RAC: 0 |
Who is the guilty party submitting tasks that all "Error while computing"? I have 70 tasks on April 4th that have errored with no credit. My stats have dropped over 10% in the past day. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1671 Credit: 17,529,908 RAC: 22,862 |
Has there been a significant project change which could be the cause of this increased usage or am I looking for another problem?Hard to say. I just checked out my Data usage, and it is actually less than it has been- in the last reporting period there were some large deferred Windows updates so they would have skewed the figures. Even so, my average usage is around 1GB per day. Since the 28/3 my usage is around only 731MB per day (to date). Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1671 Credit: 17,529,908 RAC: 22,862 |
Who is the guilty party submitting tasks that all "Error while computing"? I have 70 tasks on April 4th that have errored with no credit.That would be you. Along with everyone else- as mentioned in several posts here & some other threads there is a current batch of work that is presently producing almost nothing but errors. My stats have dropped over 10% in the past day.Mine are still climbing, but that is after falling for 4 days straight due to the lack of work for a while, and the fact there is now a new batch of work and that it takes a while for granted Credit to stabilise. Grant Darwin NT |
Falconet Send message Joined: 9 Mar 09 Posts: 353 Credit: 1,222,776 RAC: 4,349 |
Queued jobs dropped to 393,000 from over a million on the last update. Looks like someone pulled off some batches from circulation. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org