Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 124 · 125 · 126 · 127 · 128 · 129 · 130 . . . 300 · Next
Author | Message |
---|---|
Ole Pettersen Send message Joined: 4 Dec 10 Posts: 7 Credit: 11,663,827 RAC: 4,003 |
I got this when I tried to open Oracle VM VirtualBox: |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,508,345 RAC: 9,552 |
You could try this app: http://leomoon.com/downloads/application/leomoon-cpu-v/ And there are other tests listed here too: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4161 |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Interesting. I took a look at the applications page https://boinc.bakerlab.org/rosetta/apps.php and the VirtualBox page https://boinc.berkeley.edu/wiki/VirtualBox but I don't see anything that would indicate a definitive answer. My Linux systems do not have VirtualBox installed. The apps page does indicate they are running on Linux systems but they don't indicate if VirtualBox is or needs to be installed. The same goes for Windows in the apps page. You need VirtualBox for Linux too. There are different versions of Linux, with different libraries. It can cause problems on any project. Even LHC uses VirtualBox on Linux for Theory and ATLAS, though they also have "native" apps for them. But the native apps need a "container" such as singularity, which is even more complicated to set up than just installing VirtualBox. Though it would be nice to have native versions for Rosetta too if they can manage it, since it runs a little more efficiently, and probably needs less memory. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
DEK/ADMIN/dcdc What the heck is going on with your scheduler? Or is there a bug in BOINC? I have now 998 (What the *!&*#?) tasks sitting in my queue. I lost 11 python tasks because something clogged my system with over 600 4.20 tasks I am going to abort 900 tasks because there is no way hell I can plow through that many. That's 332 days of crunching. That just beyond belief!!! Plus that clogs up my system because your project is limited to a certain number of cpu's while I share the rest with several other projects and all your tasks have 8 hour run times. If I drop that to 4 that would be still over 5 months of work. And all these tasks were due on the 24th? Explain this to me..... BOINC bug or your system bug? ---- Sent back 880 or so tasks...geees...i better go find a limiting command |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
What the heck is going on with your scheduler? I have not had that problem since getting rid of the "max concurrent" in the app_config.xml, as we discussed earlier. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
What the heck is going on with your scheduler? I have to have max concurrent in order to limit the number of cpu's RAH uses, otherwise my idea of splitting up my system so every project has its own group of cores is out the door and then I run into problem of every project dominating my system and some get all the work for days on end and others don't. If RAH would do like LHC and allow ME to pick how many cores to give it, then I would not have to do max concurrent. I need to find a command to limit the number of files downloaded. OR would project_max_concurrent be any better than max_concurrent? How about ncpus and take out max_concurrent? |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
OR would project_max_concurrent be any better than max_concurrent? Unfortunately, project_max_concurrent won't work either. https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5720&postid=45323#45323 You could try ncpus, but I think that is mainly for multi-threaded apps. What I do is just create separate BOINC instances for each type of work unit when I need to limit them. Then you can just set "use at most % of the processors" to limit it to what you want. It is a bit of a pain, but actually simple enough once you do it the first time. This more or less gives you all you need. https://www.overclock.net/threads/guide-setting-up-multiple-boinc-instances.1628924/ I think that on Windows, they neglect to tell you how to start it up automatically, though you can do it manually easily enough. I used Task Scheduler to start the BOINC client automatically, but don't have the details since I usually use Ubuntu, where it is a bit simpler. I think you have to start up BOINC Manager separately in the "Programs/Startup" folder, but that was on Win7 and I am now on Win10. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
OR would project_max_concurrent be any better than max_concurrent? ugghh...I guess I am screwed until someone figures this bug out. I am on vacation right now so I could probably do something like you talk about, but work days, I just want to fire up my computer and get going for the day. I don't have time to fire up 6 different processes. RAH should get their act together and copy LHC webpage setup, then I wouldn't have to do all this #$% |
Breno Send message Joined: 8 Apr 20 Posts: 30 Credit: 12,877,800 RAC: 8,343 |
I'm not sure if this issue is already known, but I decided to install boinc and boinctui on my five NVIDIA Jetson TX2 and TX1 boards and add R@h. I was able to configure them into my account, but all boards freeze when the task is supposed to be downloading. It is not related to absence of network, since I logged into the boards by ssh. Does anyone have a clue on how to solve it? https://ibb.co/18KTxnG |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,508,345 RAC: 9,552 |
Can BOINC write to that directory? |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I have to have max concurrent in order to limit the number of cpu's RAH uses, otherwise my idea of splitting up my system so every project has its own group of cores is out the door and then I run into problem of every project dominating my system and some get all the work for days on end and others don't. Until BOINC is fixed to work with max concurrent, you can speed up the correction of the number of work units by placing this in your cc_config.xml file: <cc_config> <options> <rec_half_life_days>1.000000</rec_half_life_days> </options> </cc_config> That will correct the number of work units in a couple of days to be in accordance with the resource share for each project. It still won't allow you to fine-tune the pythons verses the regular Rosettas, but that is basically set by the project anyway according to their priorities, so I don't know that it should be changed. But if you try to run too many projects, BOINC gets all confused anyway, and an app_config.xml just makes it more confused than usual. I would never run more than three projects at a time, and one or two is better, especially if they have widely different run times. |
Breno Send message Joined: 8 Apr 20 Posts: 30 Credit: 12,877,800 RAC: 8,343 |
Interesting, you're suggesting it might be a sudo issue. Surely it is not a disk space issue. I was thinking that maybe it was a version issue or maybe network related. However, the project was recognized and the tasks were scheduled, they are just not downloading. Also, network is set to be freely used by the project, so it can't be that. I'll keep checking. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,589,473 RAC: 22,408 |
If RAH would do like LHC and allow ME to pick how many cores to give it, then I would not have to do max concurrent.It's only an issue because you don't like how BOINC works- buy allocating resources to each project depending on computation work done, not on time spent on each project. Which means that with multiple cores/threads & multiple projects the number of Tasks for any given project being processed at any given time will vary. If you were happy to just let it process according to your Resource share settings, without micro managing it, it wouldn't be an issue. RAH should get their act together and copy LHC webpage setup, then I wouldn't have to do all this #$%You don't have to do it. It's something you chose to do. Ideally BOINC would just fix the problem so that those that feel the need to micro-manage things can do so without such issues occurring. Grant Darwin NT |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,523,781 RAC: 8,309 |
BUT: You need to make them selectable from the regular Rosettas. +1 |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
I have to have max concurrent in order to limit the number of cpu's RAH uses, otherwise my idea of splitting up my system so every project has its own group of cores is out the door and then I run into problem of every project dominating my system and some get all the work for days on end and others don't. I'll give this a try in the morning. What I was getting tired of and why I split my system up was long running project times would interfere with short project times and my stats would go all haywire and I would also get cancelled not started in time issues. So I decided I will just split my system up to even out each project. Then each project can do what it wants. But with the max_concurrent problem that means I have to watch RAH downloads. It looks like the BOINC team is looking into this according to a post another group directed me to. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Interesting, you're suggesting it might be a sudo issue. It's code issue in BOINC. I was pointed to a gethub thread where a person described this exact condition exactly as I am experiencing it. the max_concurrent command used in some projects causes this problem. I watched RAH download the usual amount to keep my system busy within the time deadline, but then it would sneak in two extra tasks every few minutes, so if I don't watch and shut it off, then I will get a years worth in a matter of a few hours. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
If RAH would do like LHC and allow ME to pick how many cores to give it, then I would not have to do max concurrent.It's only an issue because you don't like how BOINC works- buy allocating resources to each project depending on computation work done, not on time spent on each project. Which means that with multiple cores/threads & multiple projects the number of Tasks for any given project being processed at any given time will vary. It's a matter of trying to even out the projects credits and run times. Some of my projects have longer run times and also want to use all or most of the cores causing an imbalance and missed deadlines. I decided that I would split up the projects into chunks of cores so that each project has its own dedicated set of cores to use and run as it pleases. It ran fine until this bug showed up. I just manually load in X number of tasks until Emfer Boinc tasks tells me I have 2 or 3 days of work and then i switch back to no new work. BOINC is aware of this and it looks like the developers are digging into it. It's just going to take time and before they release a new version they have to work out all the other bugs. So for now this is how it goes. But again, LHC allows me to select which projects I want to work in (maybe later RAH can do that when they get more Python work) and LHC allows me to select the number of cores I want to give it. It's time for RAH to grow up and adapt. Not everyone runs just RAH as a dedicated project some of us want to give time evenly to other projects. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,589,473 RAC: 22,408 |
It's a matter of trying to even out the projects credits and run times. Some of my projects have longer run times and also want to use all or most of the cores causing an imbalance and missed deadlines.That's what your Resource share setting is for, and having zero (or next to no cache) stops the missed deadlines (when it's not ignored by a bug with one of the BOINC configuration options of course). Once BOINC knows how long various Tasks actually take (a minimum of Ten Tasks of a particular type completed & Validated), it can then process Tasks for each project as it needs to in order to meet your Resource share settings. Of course the more active Projects you have, the longer it will take for things to settle down. And setting no new Tasks, then re-enabling then, and aborting work all add to the time it takes for BOINC to figure things out. And given that there are times Projects may or may not have work, and some applications are more efficient than others it will mean there will be times where BOINC will do more of one (or several) projects than another (or several projects) in order to meet your Resource share settings. Remember- resource share is not worked out based on time, but on an estimate of work actually done. And of course it does take time to do that- be it hours, days, weeks or months depending on your hardware, the number of projects you run & the size of your cache. But again, LHC allows me to select which projects I want to work in (maybe later RAH can do that when they get more Python work)As did Seti and i'm sure many other projects. I agree that Rosetta should allow people to chose whether they want to run Rosetta 4.2 or the Vbox application, or both. It's time for RAH to grow up and adapt. Not everyone runs just RAH as a dedicated project some of us want to give time evenly to other projects.And i will point out yet again- that is not how BOINC works. Processing time for a project isn't dolled out based on time- it's based out on work done, in order to meet your Resource share settings. Someone having 10 projects, each with a Resource share value of 100 does not give each Project equal time. What it does do is give each project whatever time it needs for it's applications to do an equivalent amount of work to each of the other projects. Having 2 projects, one with a Resource share of 500 and the other only 10, does not mean the one with 500 gets 50 times more processing time. If the one with the 500 Resource share value has an extremely efficient GPU application, and the one with a Resource share value of 50 has a really inefficient CPU application in a old system with a modern high end GPU then the 500 value project may only get 1/100th of the processing time that the CPU application gets. Most of the time will be spent processing work for the project that has only the 50 Resource share value. Yet because of the huge amount of work actually done by the GPU compared to the CPU, the Resource share settings will still be met, even though the CPU project get 100 times more processing time than the GPU based project. Grant Darwin NT |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
It's a matter of trying to even out the projects credits and run times. Some of my projects have longer run times and also want to use all or most of the cores causing an imbalance and missed deadlines.That's what your Resource share setting is for, and having zero (or next to no cache) stops the missed deadlines (when it's not ignored by a bug with one of the BOINC configuration options of course). Now its getting as complicated as what I am doing. I have to run everything at 100, see how they work again, then fine tune the resource share for each project. That's just as bad as the max_concurrent. With RAH, it's not about the type of project, its about the core count. That's what I am after. X # of cores per project. Since the stats are all over the place, I will try your idea. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,589,473 RAC: 22,408 |
Now its getting as complicated as what I am doing.What is difficult about it? If all projects are of are of equal importance, then just have them all set to 100. If they aren't- then what project is most important? Which is least? Rate them in order of importance & then set your Resource share values for each one accordingly- keeping in mind they are not a percentage, they are a ratio. With RAH, it's not about the type of project, its about the core count.Once again as it seems i am not getting through- BOINC does not work that way. Resource share is about work actually done. Not time. Not cores. Not threads. Not CPU. Not GPU. It is about the work done. While Credit is supposed to indicate work done, that isn't the case. So the BOINC Scheduler uses REC (Recent Estimated Credit) to determine Scheduling. Extremely roughly- any Task requires so many FLOPs (Floating Point Operations) to be performed. It takes however long to actually do the work on a given system for a given application. BOINC/the Project server keeps track of this time & the amount of work done. If all Projects paid out Credit according to the definition of the Cobblestone, then people's RAC could be used for Scheduling. Since that doesn't happen, people's RAC isn't used- REC is. A Task that takes however long to do a certain amount of FLOPs would earn x amount of Credit using the official definition of the Cobblestone- that is the amount of REC for that Task. All the Tasks done for all the different applications for that project produce a total amount of REC value for that Project. All the work done by the applications for all of the Projects produce an REC for each Project. The BOINC scheduler does enough work from each Project over a period of time (days, weeks, months if need be) so that the REC value between projects matches whatever Resource shares you have selected. That is how it works. It is not based on just time. It is not based on Cores. It is not based on Threads. It is not based on CPU. It is not based on GPU. It is based on work done. Grant Darwin NT |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org