Message boards : Number crunching : Rosetta 4.1+ and 4.2+
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 34 · Next
Author | Message |
---|---|
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 389 Credit: 12,069,196 RAC: 14,319 |
Thanks I've just tried it and its downloaded six 4.12-i686 WUs estimated at 11 hours each :-( |
Bas Send message Joined: 19 Mar 20 Posts: 2 Credit: 323,889 RAC: 0 |
I got both 64 and 32bit WU's after a reset. I have removed 32bit support from my Arch linux machines, looks like the boinc-clients have noticed this: [Rosetta@home] App version has unsupported platform i686-pc-linux-gnu; changing to x86_64-pc-linux-gnu Unfortunately, communication is deferred by an hour, so I don't know if this will help. I'll let you know in an hour or so. ;) For people who need 32bit support, this is obviously not a solution. |
Bas Send message Joined: 19 Mar 20 Posts: 2 Credit: 323,889 RAC: 0 |
Unfortunately, communication is deferred by an hour, so I don't know if this will help. I'll let you know in an hour or so. ;) For people who need 32bit support, this is obviously not a solution. Looks like it did the trick, all downloaded WU's are now x86_64. |
CIA Send message Joined: 3 May 07 Posts: 100 Credit: 21,059,812 RAC: 0 |
I posted over on Ralph but figured I should ask here as well. New 4.15 seems to be working better on older MacOS machines, but now after crunching for 8 hours on some I'm getting processing errors at the end with "finish file present too long". Not all 4.15 WU fail, but about 20% are. Are these WU errors or the result of a new bug in 4.15? Example: https://boinc.bakerlab.org/rosetta/result.php?resultid=1141395048 If you click on tasks under my profile, all my failed 4.15 units across all machines say the same thing, so I know it's not just one machine being sketchy. If it's just a WU thing, well ok. But still it's a bummer to crunch for 8 hours only to have it fail. |
MarkJ Send message Joined: 28 Mar 20 Posts: 72 Credit: 25,238,680 RAC: 0 |
New 4.15 seems to be working better on older MacOS machines, but now after crunching for 8 hours on some I'm getting processing errors at the end with "finish file present too long". Thats a BOINC "feature", supposedly fixed in 7.16. The machine hasn't been able to move the files out of the slot directory fast enough. That means the disk is overloaded. Freeing up a thread for BOINC might help or if possible use a faster storage medium (SSD or faster disk). Not sure if you want to try a beta-test version of BOINC (ie the 7.16's) on that machine or even if there is one. The BOINC developers are concentrating on x64 machines only in-line with Apple's dropping support for 32bit apps in current OSX versions. BOINC blog |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1670 Credit: 17,503,596 RAC: 24,507 |
I posted over on Ralph but figured I should ask here as well. New 4.15 seems to be working better on older MacOS machines, but now after crunching for 8 hours on some I'm getting processing errors at the end with "finish file present too long".The "finish file present too long" issue has been around for years. But why it would start pupping up now- en masse -is a bit of a mystery. From the BOINC forums- "In looking through the client code it looks like this condition occurs when the client finds that the boinc finish file has been written to disk but the science application process is still running." When a Task completes the BOINC Manager expects the Application to finish it's house keeping & exit within a certain time frame. If it doesn't the Manager just clobbers it, and you get the "finish file present too long" error. Generally it happens during periods of heavy disk I/O & heavy CPU usage. The more cores you have, and the more things (inc other than just Rosetta) the system is doing at the time, the more likely the error. So the more threads, larger result files, many applications finishing (or at least checkpointing at the same time- eg such as when exiting BOINC & re-booting a system), the more likely the problem is. A fix had been proposed (AFAIK) years ago, I did ask in another project when the fix was rolled out & the answer i got was "There is currently a v7.16.5 Beta available - that should fix it." Grant Darwin NT |
CIA Send message Joined: 3 May 07 Posts: 100 Credit: 21,059,812 RAC: 0 |
I posted over on Ralph but figured I should ask here as well. New 4.15 seems to be working better on older MacOS machines, but now after crunching for 8 hours on some I'm getting processing errors at the end with "finish file present too long".The "finish file present too long" issue has been around for years. But why it would start pupping up now- en masse -is a bit of a mystery. I've only done some minor "testing" and it seems like it happens more often when a lot of tasks end almost simultaneously. If I can pause some threads and have them finish more spaced out it seems to not be an issue. When the last WU drop happened I had 24 tasks start at the same time, and mostly finish at the same time. That's when I saw it happen more often. Once they space out a bit they (at least from my observations) don't seem to fail. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1670 Credit: 17,503,596 RAC: 24,507 |
I've only done some minor "testing" and it seems like it happens more often when a lot of tasks end almost simultaneously. If I can pause some threads and have them finish more spaced out it seems to not be an issue. When the last WU drop happened I had 24 tasks start at the same time, and mostly finish at the same time. That's when I saw it happen more often. Once they space out a bit they (at least from my observations) don't seem to fail.Yep, makes sense. Reduce the disk I/O & CPU requirements by not having everything happening all at the same time. But as CPUs get more cores & threads, and GPUs (for the projects that use them) get more & more powerful, it's just going to be a bigger & bigger issue. So hopefully the next BOINC manager version, when it's finally released, will fix the problem once at for all. *fingers crossed* Grant Darwin NT |
CIA Send message Joined: 3 May 07 Posts: 100 Credit: 21,059,812 RAC: 0 |
A fix had been proposed (AFAIK) years ago, I did ask in another project when the fix was rolled out & the answer i got was "There is currently a v7.16.5 Beta available - that should fix it." 17.16.6 is the most recent beta for OSX Boinc (dated April 3rd, 2020). If I run that (and it seems to work) will my work still be counted as valid for Rosetta? Or will it just be similar to Ralph where it's more proof of concept work? |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1670 Credit: 17,503,596 RAC: 24,507 |
17.16.6 is the most recent beta for OSX Boinc (dated April 3rd, 2020). If I run that (and it seems to work) will my work still be counted as valid for Rosetta? Or will it just be similar to Ralph where it's more proof of concept work?You process work for a Project, you get credit from that project- eg. you process work for Ralph, you get Credit from Ralph. You process work for Rosetta, you get Credit Rosetta. You process work for Einstein, you get Credit from Einstein, since they are the projects that you are doing work for. All the manager does is schedule when work is run between different projects & do the job of requesting, & reporting completed work. It's not responsible for any of the processing; that's what the applications it downloads to process the work do. BOINC is what the projects use to run & manage their projects, the BOINC Manager is the client for computers that lets people do work for those projects. So it doesn't matter what BOINC Manager you use to join & process work for a project (unless there is some particular issue that affects a project you wish to join). I'd just check the release notes to see what what (if any) the known issues are, and decide if you're prepared to put up with them. If things get too annoying, just re-install your current Manager. Grant Darwin NT |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
In general, I believe the maximum memory consumption of a task is as it approaches the end of a model. In your case, you are describing the end of the WU... which would always coincide with the end of a model as well. So, it is possible there was some swap contention, checkpointing, WU completion disk activity all going on at the same time. Rosetta Moderator: Mod.Sense |
Keith Myers Send message Joined: 29 Mar 20 Posts: 97 Credit: 332,141 RAC: 1,350 |
The timeout for the finish file check was lengthened from 10 seconds to 300 seconds in #3019 and incorporated in the 7.16 client branches. There is no reason to be scared of a "test" BOINC version. Perfectly usable. |
Blackbird Send message Joined: 16 Jan 07 Posts: 5 Credit: 727,105 RAC: 1,836 |
Whatever issues I had appear to be resolved with rosetta 4.15. I have successfully run and validated a number of tasks now. |
Plomos Send message Joined: 4 Mar 11 Posts: 11 Credit: 439,043 RAC: 0 |
I have had several 4.12 WUs that have completed just fine and earned plenty of credits such as https://boinc.bakerlab.org/rosetta/result.php?resultid=1142060959 but two of them that just finished https://boinc.bakerlab.org/rosetta/result.php?resultid=1142041210 and https://boinc.bakerlab.org/rosetta/result.php?resultid=1142041257 only ran 1 decoy in 8 hours which seems to be a problem with the 32bit version of the app. I am running a 64bit OS so I would hope that all the tasks that are pulled to my machine are 64bit but this does not seem to be the case |
csbyseti Send message Joined: 24 Dec 05 Posts: 11 Credit: 4,909,876 RAC: 21,280 |
The 64-Bit app for Linux works fine with my machine, i think the 32 bit app for Linux has a bug. Perhaps the Rosetta Admins should think about removing 32-bit apps for x86 cpu's. Machines run 32 bit must be really old with low speed cpu's. It makes no sense running this machines for Boinc. |
Andreas Kübrich Send message Joined: 24 Mar 20 Posts: 1 Credit: 243,141 RAC: 0 |
Machines run 32 bit must be really old with low speed cpu's. Or they’re reasonably recent systems running a 32-bit operating system. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1670 Credit: 17,503,596 RAC: 24,507 |
Perhaps the Rosetta Admins should think about removing 32-bit apps for x86 cpu's.Why? They're no slower than the equivalent 64bit application. Grant Darwin NT |
csbyseti Send message Joined: 24 Dec 05 Posts: 11 Credit: 4,909,876 RAC: 21,280 |
Machines run 32 bit must be really old with low speed cpu's. Makes no sense because of memory limitation. There is no reason using a 32 bit OS on a modern CPU. Or you want to use old software which don't work on modern OS. But why must this machine run Boinc? Perhaps they can count the number of 32-bit Systems, adding the generated TFlops and then decide to stop 32-bit app or not. Perhaps the Rosetta Admins should think about removing 32-bit apps for x86 cpu's.Why? They're no slower than the equivalent 64bit application. The Projekt developers have to support two more App-Versions not really needed anymore. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1670 Credit: 17,503,596 RAC: 24,507 |
My point exactly. There is no need for the 64bit applications, so why produce them?Perhaps the Rosetta Admins should think about removing 32-bit apps for x86 cpu's.Why? They're no slower than the equivalent 64bit application. Grant Darwin NT |
James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0 |
Name: rb_04_06_17111_20290_ab_t000__h002_robetta_IGNORE_THE_REST_09_18_905299_13_0 Application: Rosetta v4.12 windows_intelx86 Device: 1759960, Task: 1141280631, and WU: 1027346991. Status: Validate error. Exit status: 0 (0x00000000) Errors: Too many total results. canonical result: 1141280631 ====================================================== Just appears strange that the app ran to completion and credit was even figured. Also this was considered a "canonical result." However, apparently something made this an "invalid" result, which is not obviously explained in the Stderr output. |
Message boards :
Number crunching :
Rosetta 4.1+ and 4.2+
©2024 University of Washington
https://www.bakerlab.org