computation errors

Message boards : Number crunching : computation errors

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,840,739
RAC: 137
Message 105095 - Posted: 21 Feb 2022, 8:58:28 UTC - in response to Message 105028.  
Last modified: 21 Feb 2022, 9:02:28 UTC

I am a little puzzled by your message actually.

>>> Did you even pay the slightest bit of attention to the statement i was responding to???

Well, I didn't really need to since the statement you appeared to be responding to was written by me? I do know how BOINC works, I've been with it, and Davids earlier version, (probably not the right word, but it conveys the meaning, anyway, since 1998), since its first appearence.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 105095 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1673
Credit: 17,595,878
RAC: 22,416
Message 105096 - Posted: 21 Feb 2022, 9:41:28 UTC - in response to Message 105095.  

I am a little puzzled by your message actually.
Not nearly as puzzled as i am by yours.


>>> Did you even pay the slightest bit of attention to the statement i was responding to???

Well, I didn't really need to since the statement you appeared to be responding to was written by me?
Which makes your response all the more baffling to me- i would have hoped you'd realise what it was you had said, and so would understand what was being answered.
Yet that didn't happen- you still quoted me out of context and took my answer for one statement, and applied it to something different.



I do know how BOINC works, I've been with it, and Davids earlier version, (probably not the right word, but it conveys the meaning, anyway, since 1998), since its first appearence.
And yet you attributed normal BOINC behaviour to some action by the project.
Here is you statement-
Perhaps someone at the project noticed the problem. I had a large number of these quickly crashing work units, on both my machines, but neither has downloaded new work for a couple of hours.
Anyone that knows how BOINC works would know that's what happens when a load of rubbish goes out that just causes errors- BOINC backs off with it's Scheduler contacts, with increasing backoff periods. Hence no downloads for a couple of hours.. So why did you attribute what is normal BOINC behaviour under those circumstances to something the project did?
And that is why i gave the answer i did- pointing out to someone that obviously didn't know what was normal BOINC behaviour & so not a result of any action the project may or may not have taken in relation to their faulty Tasks.
Grant
Darwin NT
ID: 105096 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,840,739
RAC: 137
Message 105097 - Posted: 21 Feb 2022, 10:45:15 UTC - in response to Message 105096.  

>>> Not nearly as puzzled as i am by yours.

Oh, I see, my puzzlement is bigger than yours!

I still have no idea of what you're on about. Rosetta USED to be a project with endless work units from their various work projects to download and crunch. Since late last year, that changed. Now, it is a project that produces a batch of work, lets that run, then stops for an unknown period, until the next batch starts. I saw new work from Rosetta coming in, and watched them crashing quickly. After seeing that, It wasn't wasting vast amounts of time, but felt no new tasks was appropriate. Looking at the sent times, I see that three would come, fail and return. About 30 minutes later, another three would come, fail and return. After a number of these cycles, I set no new tasks. Some time later, I opened downloads to see what the status was, and nothing came. Conclusion, they have all gone, or that someone at the project had stopped them.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 105097 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 105100 - Posted: 21 Feb 2022, 12:32:43 UTC

Someone put movingstubs on the server without beta testing it. It failed fast. One poster in another thread said Linux systems were running it ok, but all us windows users had the tasks fail immediately.

Unfortunately, no one from the project pays attention to this and we had to download and error out x2 people all these tasks to get them out of the system.

Such is life with this project now.
ID: 105100 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,840,739
RAC: 137
Message 105103 - Posted: 21 Feb 2022, 15:58:20 UTC

>>> Someone put movingstubs on the server without beta testing it

A little worrying.

>>> no one from the project pays attention to this

Would certainly seem to be the case.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 105103 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 105104 - Posted: 21 Feb 2022, 16:54:49 UTC - in response to Message 105103.  

>>> Someone put movingstubs on the server without beta testing it

A little worrying.

>>> no one from the project pays attention to this

Would certainly seem to be the case.



But you should be used to that by now.

I think for them as long as the science gets done one way or another, how it gets done is not relevant any more. You've been on here a little longer than me, so you remember the days when they cared.
ID: 105104 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,840,739
RAC: 137
Message 105110 - Posted: 21 Feb 2022, 20:23:30 UTC - in response to Message 105104.  

>>> little longer than me

Yes, you are a relative newbie!!!!!

If you consider the machines we were running back then, to keep run times sensible, the amount of calculation per work unit must have been a fraction of what we turn over today.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 105110 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
xii5ku

Send message
Joined: 29 Nov 16
Posts: 22
Credit: 13,815,783
RAC: 419
Message 105158 - Posted: 23 Feb 2022, 10:46:02 UTC - in response to Message 105097.  
Last modified: 23 Feb 2022, 10:50:18 UTC

adrianxw wrote:
Rosetta USED to be a project with endless work units from their various work projects to download and crunch. Since late last year, that changed. Now, it is a project that produces a batch of work, lets that run, then stops for an unknown period, until the next batch starts.
Rosetta@home had *always* occasional pauses. They weren't as frequent as now though. (In the Rosetta 4 workqueue, that is; the rosetta python projects workqueue permanently has work.)

adrianxw wrote:
I saw new work from Rosetta coming in, and watched them crashing quickly. After seeing that, It wasn't wasting vast amounts of time, but felt no new tasks was appropriate. Looking at the sent times, I see that three would come, fail and return. About 30 minutes later, another three would come, fail and return. After a number of these cycles, I set no new tasks. Some time later, I opened downloads to see what the status was, and nothing came. Conclusion, they have all gone, or that someone at the project had stopped them.
No. The admins are still ignoring this bug and are blissfully submitting more work for the failing application. The work was only all gone for a short while because all workunits were finished in one of the following two ways:

  • first or at least second task of the workunit completed successfully on a Linux computer --> workunit succeeded
  • first and second task ended up on Windows computers and failed --> workunit went down the drain unsuccessfully

(besides other normal failure causes, e.g. timeouts)

ID: 105158 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,840,739
RAC: 137
Message 105163 - Posted: 23 Feb 2022, 14:00:48 UTC

There has been other work recently. At this moment, I have 12 work units running on my machines that have all gone past the rapid failure point I was seeing.Two of these are past 9 hours now and running for example.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 105163 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2

Message boards : Number crunching : computation errors



©2024 University of Washington
https://www.bakerlab.org