Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 43 · 44 · 45 · 46 · 47 · 48 · 49 . . . 300 · Next

AuthorMessage
Profile VelocityRC

Send message
Joined: 4 Apr 20
Posts: 4
Credit: 516,338
RAC: 0
Message 94258 - Posted: 12 Apr 2020, 18:22:48 UTC
Last modified: 12 Apr 2020, 18:32:38 UTC

Hi all. I posted earlier about BOINC dropping Rosetta upon every re-boot and having to re-add it to the projects list. I finally got the time to track it down. I was running BOINC in my AV sandbox. I took it out of the sandbox and all is well. I can now put it back in the sandbox and Rosetta stays in the projects after re-boot.

One last question that the above didn't solve is upon every re-boot I have to re-adjust CPU useage. I have 2 different values on 2 different machines. Both behave the same way out of the sandbox or in.

Ideas ??

Thanks. Bill S. EDIT: After a cold boot all the issues are resolved. Above seems to be the procedure for those running BOINC in a sandbox. At least where Avast Premium is concerned.
ID: 94258 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,269,631
RAC: 3,846
Message 94259 - Posted: 12 Apr 2020, 18:26:57 UTC - in response to Message 94247.  
Last modified: 12 Apr 2020, 18:27:21 UTC

Sorry: I don't seem to be able to find that info: the log didn't go back that far. Anyway its the only task where it happened.

I'm now using BOINC 7.16.5.

I think the release notes for this version mentioned a fix for this problem - the problem with waits They consider the oldest parts of the log being truncated a feature necessary when BOINC runs nonstop for weeks or months at a time.

One more detail - the log file is automatically emptied when BOINC restart after being shut down.
ID: 94259 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1670
Credit: 17,523,845
RAC: 23,480
Message 94286 - Posted: 12 Apr 2020, 23:57:10 UTC - in response to Message 94255.  

Only thing out of spec is how fast I have the memory running.
I ran MemTest Pro for a good 24 hours and no errors.
Programmes such as Rosetta can work the memory harder than memory testing programmes, or just differently enough, to find a problem the testing programme doesn't.
I'd suggest reverting to the default (or at least the XMP) settings for the RAM and see if that stops the errors from occurring.
Grant
Darwin NT
ID: 94286 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1670
Credit: 17,523,845
RAC: 23,480
Message 94289 - Posted: 13 Apr 2020, 0:01:26 UTC - in response to Message 94258.  

One last question that the above didn't solve is upon every re-boot I have to re-adjust CPU useage. I have 2 different values on 2 different machines. Both behave the same way out of the sandbox or in.

Ideas ??

Thanks. Bill S. EDIT: After a cold boot all the issues are resolved. Above seems to be the procedure for those running BOINC in a sandbox. At least where Avast Premium is concerned.
Looks like you figured it out.
That is the whole point of a Sandbox- you can do whatever you want in there and it makes no difference, when you restart it everything is set back the how it was at the last start.
If you make changes that you actually want to keep, you need to explicitly do so using whatever mechanism that Sandbox supports.
Grant
Darwin NT
ID: 94289 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Evil Penguin
Avatar

Send message
Joined: 10 Jun 08
Posts: 5
Credit: 10,168,989
RAC: 0
Message 94313 - Posted: 13 Apr 2020, 4:00:50 UTC - in response to Message 94286.  

Only thing out of spec is how fast I have the memory running.
I ran MemTest Pro for a good 24 hours and no errors.
Programmes such as Rosetta can work the memory harder than memory testing programmes, or just differently enough, to find a problem the testing programme doesn't.
I'd suggest reverting to the default (or at least the XMP) settings for the RAM and see if that stops the errors from occurring.

The errors have been from running Rosetta Mini v3.78 windows_intelx86 tasks.
Why is it running the 32-bit version instead of 64-bit?

The Rosetta Mini v3.78 windows_x86_64 tasks have been running without issue.
ID: 94313 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1670
Credit: 17,523,845
RAC: 23,480
Message 94315 - Posted: 13 Apr 2020, 4:29:44 UTC - in response to Message 94313.  

The errors have been from running Rosetta Mini v3.78 windows_intelx86 tasks.
Why is it running the 32-bit version instead of 64-bit?
The BOINC Manager will try all applications for a particular Platform, and go with whichever one appears to be the best.


The Rosetta Mini v3.78 windows_x86_64 tasks have been running without issue.
And on my system 90% or more of the Rosetta Mini Tasks have been done with Rosetta Mini v3.78 windows_intelx86 with no recent computation errors (there were some dud Tasks some time back), even the same type of Tasks as your system is erroring on have Validated.
It could be a problem with the application, and those Tasks, on the Ryzen platform. But i haven't seen any other people reporting issues, so i'd suggest seeing how things go with the memory at default settings.
Grant
Darwin NT
ID: 94315 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Keith Myers
Avatar

Send message
Joined: 29 Mar 20
Posts: 97
Credit: 332,141
RAC: 1,223
Message 94367 - Posted: 13 Apr 2020, 19:34:44 UTC - in response to Message 94315.  

I had nothing but errors on both the i686 applications on my Ryzen. Gave up on Rosetta and moved to Einstein. Discovered later that you can set a flag in cc_config.xml to ignore alternate platforms.
<no_alt_platform>1</no_alt_platform>
That would have told the Rosetta scheduler to not send me x86 applications and just send me the x86_64 applications.
ID: 94367 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom M

Send message
Joined: 20 Jun 17
Posts: 87
Credit: 14,880,624
RAC: 117,108
Message 94410 - Posted: 13 Apr 2020, 22:57:24 UTC - in response to Message 94367.  

I had nothing but errors on both the i686 applications on my Ryzen. Gave up on Rosetta and moved to Einstein. Discovered later that you can set a flag in cc_config.xml to ignore alternate platforms.
<no_alt_platform>1</no_alt_platform>
That would have told the Rosetta scheduler to not send me x86 applications and just send me the x86_64 applications.


Huh, I didn't notice mine apparently.

Tom M
Help, my tagline is missing..... Help, my tagline is......... Help, m........ Hel.....
ID: 94410 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sven

Send message
Joined: 7 Feb 16
Posts: 8
Credit: 222,005
RAC: 0
Message 94421 - Posted: 14 Apr 2020, 7:37:04 UTC - in response to Message 94410.  

Hi all,

concerning my problems with this issue:

****
Rosetta@home | Task xxxx exited with zero status but no 'finished' file
Rosetta@home | If this happens repeatedly you may need to reset the project.
*****

... I've got a result of what to do to avoid this kind of error message:

It seems to be recommendable to make sure, that the following setting ist adjusted:
Use at most 100% of CPU time

All other settings, including max usage of CPUs, don't influence the processing of Rosetta tasks.

Sven
ID: 94421 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2115
Credit: 41,112,600
RAC: 19,835
Message 94423 - Posted: 14 Apr 2020, 8:41:13 UTC - in response to Message 94421.  

Concerning my problems with this issue:

****
Rosetta@home | Task xxxx exited with zero status but no 'finished' file
Rosetta@home | If this happens repeatedly you may need to reset the project.
*****

...I've got a result of what to do to avoid this kind of error message:

It seems to be recommendable to make sure, that the following setting is adjusted:
Use at most 100% of CPU time

Very useful information, thanks
ID: 94423 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Keith Myers
Avatar

Send message
Joined: 29 Mar 20
Posts: 97
Credit: 332,141
RAC: 1,223
Message 94455 - Posted: 14 Apr 2020, 15:57:03 UTC - in response to Message 94421.  

Hi all,

concerning my problems with this issue:

****
Rosetta@home | Task xxxx exited with zero status but no 'finished' file
Rosetta@home | If this happens repeatedly you may need to reset the project.
*****

... I've got a result of what to do to avoid this kind of error message:

It seems to be recommendable to make sure, that the following setting ist adjusted:
Use at most 100% of CPU time

All other settings, including max usage of CPUs, don't influence the processing of Rosetta tasks.

Sven

Interesting. You shouldn't be getting those still now that you've updated to the latest 7.16.5 client which has the revised code fix to stop those errors. Your system would have to be too busy to service the slot cleanup for longer than five minutes to still get those errors.
ID: 94455 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1670
Credit: 17,523,845
RAC: 23,480
Message 94496 - Posted: 14 Apr 2020, 23:58:01 UTC - in response to Message 94455.  
Last modified: 14 Apr 2020, 23:59:47 UTC

Hi all,

concerning my problems with this issue:

****
Rosetta@home | Task xxxx exited with zero status but no 'finished' file
Rosetta@home | If this happens repeatedly you may need to reset the project.
*****

... I've got a result of what to do to avoid this kind of error message:

It seems to be recommendable to make sure, that the following setting ist adjusted:
Use at most 100% of CPU time

All other settings, including max usage of CPUs, don't influence the processing of Rosetta tasks.

Sven

Interesting. You shouldn't be getting those still now that you've updated to the latest 7.16.5 client which has the revised code fix to stop those errors. Your system would have to be too busy to service the slot cleanup for longer than five minutes to still get those errors.
You're thinking of the "Finished file present too long" issue.
But it is very probably some sort of I/O problem- the settings for the systems barely allowed any processing, with extremely frequent suspending & resuming occurring.
Grant
Darwin NT
ID: 94496 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ian

Send message
Joined: 12 Oct 07
Posts: 3
Credit: 2,611,432
RAC: 0
Message 94520 - Posted: 15 Apr 2020, 8:56:28 UTC

Hello Rosetta community,

I have a fairly persistent issue with uploading work units. What seems to happen is that the upload looks to have proceeded normally, but then sticks at 100% and never gets removed from the transfer queue. The net effect of this is that BOINC eventually runs out of disk space as it is all in use by pending Rosetta uploads. Rosetta is the only BOINC project I have this issue with. I have tried suspending/restarting uploads as suggested in another thread. I have also tried resetting and deleting and re-adding the Rosetta project entirely, but I have the same issue. Running on Windows 10, latest BOINC client.

Does anyone have anything I can check or try in addtion to the above?
Thanks, Ian
ID: 94520 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1670
Credit: 17,523,845
RAC: 23,480
Message 94521 - Posted: 15 Apr 2020, 9:00:47 UTC - in response to Message 94520.  
Last modified: 15 Apr 2020, 9:01:26 UTC

Hello Rosetta community,

I have a fairly persistent issue with uploading work units. What seems to happen is that the upload looks to have proceeded normally, but then sticks at 100% and never gets removed from the transfer queue. The net effect of this is that BOINC eventually runs out of disk space as it is all in use by pending Rosetta uploads. Rosetta is the only BOINC project I have this issue with. I have tried suspending/restarting uploads as suggested in another thread. I have also tried resetting and deleting and re-adding the Rosetta project entirely, but I have the same issue. Running on Windows 10, latest BOINC client.

Does anyone have anything I can check or try in addtion to the above?
Thanks, Ian
Do you use any sort of 3rd party AV/Internet security programme? (just grasping at straws here).
Grant
Darwin NT
ID: 94521 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sven

Send message
Joined: 7 Feb 16
Posts: 8
Credit: 222,005
RAC: 0
Message 94522 - Posted: 15 Apr 2020, 9:02:16 UTC - in response to Message 94496.  

Most of the error messages occured over night when there was no other work load on the system.

Usually I would say that a project should be able to handle every kind of client setting. Obiously I'm wrong with this opinion.

Due to heavy fan noise I acutally adjusted the settings to "suspend when computer is in use" and I still don't have any more error messages like before when I have reduced the percentage of allowed cpu time.


Addtional information:
As I've freshly downloaded the newest boinc client on this new computer, I'm working with the version 7.16.5 (x64), which doesn't help against the "Task xxxx exited with zero status but no 'finished' file" error.
ID: 94522 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ian

Send message
Joined: 12 Oct 07
Posts: 3
Credit: 2,611,432
RAC: 0
Message 94528 - Posted: 15 Apr 2020, 10:42:45 UTC - in response to Message 94521.  

Hi, the machine has Trend Micro Security agent installed. If I exit this and retry the upload, I see the same behaviour - the upload does not seem to be blocked at all - the progress bar goes up to 100%, it just doesn't ever get removed once the upload is complete. Windows moans about there not being any virus checking active, so the virus checker is seemingly off at this point (well as far as Windows can detect). Same behaviour with the Windows firewall off, just sits at 100% progress.

If I leave it for a bit I get a project backoff message, so maybe it is just load on the server end. I have been having this for a while though, before the current interest in covid 19 work. One other point of note is that I do have the upload rate throttled to 500KBps as it is on a shared network if that makes a difference. Temporarily turing this off does not fix the issue however.
ID: 94528 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1670
Credit: 17,523,845
RAC: 23,480
Message 94532 - Posted: 15 Apr 2020, 11:29:11 UTC - in response to Message 94528.  
Last modified: 15 Apr 2020, 11:29:37 UTC

Hi, the machine has Trend Micro Security agent installed. If I exit this and retry the upload, I see the same behaviour - the upload does not seem to be blocked at all - the progress bar goes up to 100%, it just doesn't ever get removed once the upload is complete. Windows moans about there not being any virus checking active, so the virus checker is seemingly off at this point (well as far as Windows can detect). Same behaviour with the Windows firewall off, just sits at 100% progress.

If I leave it for a bit I get a project backoff message, so maybe it is just load on the server end. I have been having this for a while though, before the current interest in covid 19 work. One other point of note is that I do have the upload rate throttled to 500KBps as it is on a shared network if that makes a difference. Temporarily turing this off does not fix the issue however.
A few weeks back there were some upload issues, but they've been sorted. And no one else has been posting about similar upload issues. Getting to 100%, and then stopping would indicate it's not getting a final acknowledgement for the upload, but no idea why everything else would work bar that final ACK.
If all else has failed, i'd re-boot your modem, and re-boot the computer.
*fingers crossed*
Grant
Darwin NT
ID: 94532 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,269,631
RAC: 3,846
Message 94538 - Posted: 15 Apr 2020, 12:52:54 UTC
Last modified: 15 Apr 2020, 12:54:11 UTC

The download issues a few weeks back were due to additional servers near the Rosetta@home end of the connections, running overly aggressive antivirus programs examining everything that went by.

Does the shared network at your end of such connections have similar additional servers? If so, you may need to talk whoever runs those servers, and ask them to set it up so that everything sent to the Rosetta@home upload server is excluded from checking.
ID: 94538 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
amazoph

Send message
Joined: 24 Nov 13
Posts: 3
Credit: 2,099,022
RAC: 0
Message 94548 - Posted: 15 Apr 2020, 15:05:01 UTC - in response to Message 93970.  

Looks like you have many access violations. I am not seeing such errors with other people's problem reports. Have you run memtest?


Thanks - have tried both Memtest86 and Windows built in memtest on that host, both came back clean after several passes.
Not sure as to the reason for the errors, for the moment I've stopped this host from taking Rosetta tasks and have put it on WCG til I have a chance to look further.

Seems to be only Rosetta that's affected as other applications (GPUGrid and WCG's MCM) run OK.


I found the issue, this machine had XMP memory timings enabled in BIOS. Reverted to stock lower speed timing and starting to get WUs completed without errors now.
ID: 94548 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1670
Credit: 17,523,845
RAC: 23,480
Message 94588 - Posted: 16 Apr 2020, 3:46:10 UTC

rb_04_12_21176_20979__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_912410_4
          Sent                 Time reported/deadline           Status
15 Apr 2020, 23:33:22 UTC     16 Apr 2020, 2:30:44 UTC     Cancelled by server
Cancelled only 3 hours after it was sent.

If Rosetta is going to allow a grace period for Tasks that are returned after the deadline, then the next replication of it shouldn't be sent out until after the deadline grace period has passed. Saves having things like this occur.
Or do away with the grace period & send the the next copy out, and cancel the original Task. Or keep the grace period but still cancel the original Task once the next one is sent.

Pretty much everything that host does arrives late.
Grant
Darwin NT
ID: 94588 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 43 · 44 · 45 · 46 · 47 · 48 · 49 . . . 300 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org