Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 187 · 188 · 189 · 190 · 191 · 192 · 193 . . . 294 · Next

AuthorMessage
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5690
Credit: 5,859,226
RAC: 22
Message 105342 - Posted: 6 Mar 2022, 17:16:39 UTC - in response to Message 105339.  
Last modified: 6 Mar 2022, 17:31:05 UTC

A clean install or not does not seem to do any different, since one of the computers has a just freshly installed Mac OS and software and the other install is about a year or so old.
Result is the same.

The computers is restarted once every week.

I will have to look in to BOINC tasks by Emfer.



Boinc tasks will not change the way the projects run, but it will help you identify stalled tasks and let you see how the others are doing.

You checked you VM for dead unreachable tasks?

try running one batch of pythons and the ones that get stuck post a copy of the stderr file here, maybe someone will see something in the text that points to the problem.
ID: 105342 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
johndad5

Send message
Joined: 12 Aug 09
Posts: 7
Credit: 2,729,604
RAC: 0
Message 105358 - Posted: 8 Mar 2022, 5:52:04 UTC

Hello,

I am not getting any jobs at all. I checked my tasks and it says that 4 were not started on time. I looked further into one and it said the job was canceled by the user. I never canceled any tasks.



Task
click for details Computer Sent Time reported
or deadline
explain Status Run time
(sec) CPU time
(sec) Credit Application
1475804655 6163421 3 Mar 2022, 9:36:09 UTC 6 Mar 2022, 9:36:22 UTC Not started by deadline - canceled 0.00 0.00 --- rosetta python projects v1.03 (vbox64)
windows_x86_64
1476130321 3396392 6 Mar 2022, 12:33:33 UTC 7 Mar 2022, 11:24:39 UTC Error while computing 7,812.61 1,325.91 12.00 rosetta python projects v1.03 (vbox64)
windows_x86_64

ask 1475804655
Name aagb-ABU_pp-mNMPHE-GPN-ACHC12C_7_2575989_3_0
Workunit 1314564584
Created 2 Mar 2022, 23:58:43 UTC
Sent 3 Mar 2022, 9:36:09 UTC
Report deadline 6 Mar 2022, 9:36:09 UTC
Received 6 Mar 2022, 9:36:22 UTC
Server state Over
Outcome Computation error
Client state Aborted by user
Exit status 200 (0x000000C8) EXIT_UNSTARTED_LATE
Computer ID 6163421
Run time
CPU time
Validate state Invalid
Credit 0.00
Device peak FLOPS 3.53 GFLOPS
Application version rosetta python projects v1.03 (vbox64)
windows_x86_64
ID: 105358 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1626
Credit: 16,623,621
RAC: 4,245
Message 105359 - Posted: 8 Mar 2022, 6:21:02 UTC - in response to Message 105358.  

I am not getting any jobs at all. I checked my tasks and it says that 4 were not started on time. I looked further into one and it said the job was canceled by the user. I never canceled any tasks.
With your computers hidden it's pretty much impossible to help without just taking wild guesses.

But the fact is from the Task you posted, you are missing deadlines, and for every error you get, the amount of work you can get per day is reduced until you start to return Valid work.
Set your cache to 0.01 days & 0.01 additional days then when you get some more work, you should be able to return it before the deadline passes.

Also on your computer's details page down near the bottom there should be a Skip or Accept button for Python work. If it says Accept, you need to click it to get more Python work. Too many errors, and you get blocked from getting more.
Grant
Darwin NT
ID: 105359 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2070
Credit: 40,575,273
RAC: 4,567
Message 105362 - Posted: 8 Mar 2022, 12:41:24 UTC - in response to Message 105358.  

Hello,
I am not getting any jobs at all. I checked my tasks and it says that 4 were not started on time.
I looked further into one and it said the job was canceled by the user. I never canceled any tasks.

It actually gives 3 reasons for failing, 2 of them contradictory.

1 - Computation error, but no CPU time
2 - Aborted by user, but not aborted
3 - Much more likely (editing your quote for clarity)
1475804655 3 Mar 2022, 9:36:09 UTC 6 Mar 2022, 9:36:22 UTC Not started by deadline - canceled

Task 1475804655
Sent 3 Mar 2022, 9:36:09 UTC
Report deadline 6 Mar 2022, 9:36:09 UTC
Received 6 Mar 2022, 9:36:22 UTC
Exit status 200 (0x000000C8) EXIT_UNSTARTED_LATE

It looks like a task the Server thinks it sent but you never received. Maybe some blip during download.
Nothing you'd know about, nor can you do anything about even if you did know.

If you're not getting tasks, the final one of Grant's suggestions looks the likely solution
ID: 105362 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5690
Credit: 5,859,226
RAC: 22
Message 105365 - Posted: 8 Mar 2022, 20:05:03 UTC
Last modified: 8 Mar 2022, 20:06:47 UTC

You had 3 days to complete that task.
For some reason (system is to busy with other projects or a glitch) you missed the deadline, so the server pulled the task from your system.

From another project:

Outcome Computation error
Client state Aborted by user
Exit status 200 (0x000000C8) EXIT_UNSTARTED_LATE


Your BOINC client thought it was too late to start the task. I guess system clock glitched on your PC.

As for the Error while computing, we would need to see the STDERR output which you can retrieve from that tasks webpage at the bottom of the page.

Could be a bug in that task, or a problem with how the task interacts with your system or who knows what....

I would go with Grant's suggestion of .01 days and .01 days additional work or at max .25 and .25. See how things work with those settings.

You will either need to make your computers public or post the errors and the STDERR text so we can see whats going on.
ID: 105365 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
johndad5

Send message
Joined: 12 Aug 09
Posts: 7
Credit: 2,729,604
RAC: 0
Message 105371 - Posted: 9 Mar 2022, 6:03:03 UTC - in response to Message 105362.  
Last modified: 9 Mar 2022, 6:06:53 UTC

Thanks for your reply.

I unhid my computer. Sorry for the inconvenience.
ID: 105371 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1626
Credit: 16,623,621
RAC: 4,245
Message 105372 - Posted: 9 Mar 2022, 6:39:25 UTC - in response to Message 105371.  

Thanks for your reply.

I unhid my computer. Sorry for the inconvenience.
From the looks of things it's pretty rare for any of your systems to complete Rosetta work in time. Most of it would be missing the deadline.

If you are doing only one project, that has long deadlines, and has frequent server issues or shortages of work then you need a cache. If you're running more than one project, then there's no need for a cache.
0.01 & 0.01 additional days would be best, 0.25 and 0.01 additional days if you really feel the need for some sort of cache.


It would be worth checking on the Details page for each of your systems to see if down the bottom there is an Allow or Skip button for Python tasks. If it says Allow, you need to click it to start getting work again- after setting your cache to something more reasonable.
Grant
Darwin NT
ID: 105372 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5690
Credit: 5,859,226
RAC: 22
Message 105375 - Posted: 9 Mar 2022, 20:02:56 UTC - in response to Message 105371.  

Thanks for your reply.

I unhid my computer. Sorry for the inconvenience.


In computing preferences, what is your switch between tasks time?
That can also affect your ability to get work done on time.

Is Einstein CPU or GPU or both for you?
I am running GPU and the tasks are around 30 mins, but as you know, sometimes they bring out longer stuff.
Are you doing anything else in the background non BOINC? Such as FAH or anything else requiring CPU time?
ID: 105375 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
johndad5

Send message
Joined: 12 Aug 09
Posts: 7
Credit: 2,729,604
RAC: 0
Message 105390 - Posted: 11 Mar 2022, 5:04:41 UTC - in response to Message 105375.  

Hello,

I received Rosetta today for the first time in several weeks. It appears all is well now. My switch timing is 60 minutes for all of the tasks. This is the first time have run into problems with Rosetta in years. Happy that it now seems to be working.
ID: 105390 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1626
Credit: 16,623,621
RAC: 4,245
Message 105393 - Posted: 11 Mar 2022, 6:31:39 UTC - in response to Message 105390.  

I received Rosetta today for the first time in several weeks. It appears all is well now.
You were just lucky to get some Rosetta 4.20 work as it was released.
In the time between now & when you last got work there has been plenty of Python work available, but as you haven't taken any notice of what i posted about your cache resulting in missed deadlines & what you need to do to get more Python work, you will soon be out of work again.
Grant
Darwin NT
ID: 105393 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5690
Credit: 5,859,226
RAC: 22
Message 105399 - Posted: 11 Mar 2022, 9:38:19 UTC - in response to Message 105393.  

I received Rosetta today for the first time in several weeks. It appears all is well now.
You were just lucky to get some Rosetta 4.20 work as it was released.
In the time between now & when you last got work there has been plenty of Python work available, but as you haven't taken any notice of what i posted about your cache resulting in missed deadlines & what you need to do to get more Python work, you will soon be out of work again.



Not exactly Grant, look at his machines.
#1 is 4.2 and #2 is running python.
The error tasks were invalid also by the wingman for the exact same reason, memory address error.
So he has completed 6 pythons so far one #2 and bombed 3 due to errors in the task, since the wingman also suffered the same failure.

So let him continue, it looks like he has the systems dialed in.
ID: 105399 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5690
Credit: 5,859,226
RAC: 22
Message 105400 - Posted: 11 Mar 2022, 9:41:34 UTC - in response to Message 105390.  

Johndad, Your one machine will run out of work soon when all the 4.2 tasks are used up in your system.
Then you will be back to no work. So you might configure that machine to accept python or it goes idle.
That or it just does other project work until the next batch of 4.2 comes.
As we have said, 4.2 is very sporadic and disappears very fast.
ID: 105400 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1626
Credit: 16,623,621
RAC: 4,245
Message 105401 - Posted: 11 Mar 2022, 10:23:15 UTC - in response to Message 105399.  
Last modified: 11 Mar 2022, 10:24:30 UTC

The error tasks were invalid also by the wingman for the exact same reason, memory address error.
Those are the most recent Tasks. The pervious ones (that we could see at the time) all errored out due to missing the deadline.
As the work flows, the cache will re-fill & deadlines will be missed, over & over again. Enough Python errors, and you're black listed.
Grant
Darwin NT
ID: 105401 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5690
Credit: 5,859,226
RAC: 22
Message 105409 - Posted: 11 Mar 2022, 15:38:08 UTC - in response to Message 105401.  

The error tasks were invalid also by the wingman for the exact same reason, memory address error.
Those are the most recent Tasks. The pervious ones (that we could see at the time) all errored out due to missing the deadline.
As the work flows, the cache will re-fill & deadlines will be missed, over & over again. Enough Python errors, and you're black listed.



And that's what is annoying. Errors caused by their incompetentence count against us. At least we can come back. They should exclude tasks that error out on two systems.
ID: 105409 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Kyle Craig

Send message
Joined: 18 Dec 17
Posts: 2
Credit: 126,343
RAC: 0
Message 105415 - Posted: 12 Mar 2022, 0:15:40 UTC

I just installed Boinc and added rosetta@home on a brand new laptop and I get this message:

Rosetta@home: Notice from server
VirtualBox jobs require hardware acceleration support. Your processor does not support the required instruction set.

This is brand new laptop with i7 processor. How would it not support hardware acceleration? Is it that the processor is such a new model that is not recognized? How do I fix this?
ID: 105415 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5690
Credit: 5,859,226
RAC: 22
Message 105416 - Posted: 12 Mar 2022, 0:17:33 UTC

What file in pythons is the equivalent of STDERR?
Is that in the boinc_xxxx folder then the logs folder and the Vbox file or the Vbox Hardening?

I had a task that I just killed that ran 1 day and I think 42 mins and got to 99.41% complete.
In Vbox file I saw "interupts" increasing in number and wondered what that was about as I do not see that in a suspended task (shutting down for the night) that I looked at.

I was trying to find out where if any errors occurred during its time running, but I could not see anything recognizable, just interupts.
ID: 105416 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 256
Credit: 469,938
RAC: 500
Message 105417 - Posted: 12 Mar 2022, 0:19:06 UTC - in response to Message 105415.  
Last modified: 12 Mar 2022, 0:19:55 UTC

Try to find in google how to enable virtualization on your laptop model.
enable hardware virtualization [laptop model here]
ID: 105417 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1626
Credit: 16,623,621
RAC: 4,245
Message 105418 - Posted: 12 Mar 2022, 1:17:53 UTC - in response to Message 105415.  

I just installed Boinc and added rosetta@home on a brand new laptop and I get this message:

Rosetta@home: Notice from server
VirtualBox jobs require hardware acceleration support. Your processor does not support the required instruction set.

This is brand new laptop with i7 processor. How would it not support hardware acceleration? Is it that the processor is such a new model that is not recognized? How do I fix this?
Virtualisation needs to be enabled in the BIOS, and Windows HyperV must be disabled in Windows programs & settings.
And it'd be worth running the BOINC Benchmarks on that system so when you do get some work, you also get more than a token amount of Credit for it.
Grant
Darwin NT
ID: 105418 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 256
Credit: 469,938
RAC: 500
Message 105419 - Posted: 12 Mar 2022, 1:26:58 UTC - in response to Message 105418.  

Run “bcdedit /set hypervisorlaunchtype off” to disable hypervisor
ID: 105419 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1626
Credit: 16,623,621
RAC: 4,245
Message 105456 - Posted: 16 Mar 2022, 5:52:52 UTC

Looks like they fixed the problem with those Tasks that ran OK on LINUX, but crashed & burned in seconds on Windows.
Now they fail on all systems.
Grant
Darwin NT
ID: 105456 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 187 · 188 · 189 · 190 · 191 · 192 · 193 . . . 294 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org