Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 66 · 67 · 68 · 69 · 70 · 71 · 72 . . . 300 · Next
Author | Message |
---|---|
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
Stevie G wrote: I think I will put another 8 GB of RAM in it.Depends what else you’re using the machine for, but it might not be worth it. Rosetta won’t benefit; it rarely needs more than 1 GB per task. There are some really inexpensive refurbished Dell and HP computers out there, starting at $200.You can certainly get a lot for your money buying used workstations and servers. Bear in mind that cheap to buy can mean expensive to run – the older the machine, the slower and less energy-efficient. Unless you want to use the computers for heating, the very cheapest machines are likely to be a false economy. I’m sure people here will be happy to give specific buying advice; probably best to start a new thread. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2115 Credit: 41,115,238 RAC: 19,699 |
Good news - it could've easily been something much more expensive Over time, I've upgraded components one at a time, though not necessarily in the right order, I have to admit, often through necessity. I started with a power supply too and made sure it was more than enough at the time (750W modular EVGA 80 Gold+) before messing round unsuccessfully with cooling fans until I realised I needed a modern case that could handle 140mm fans both for a CPU cooler and case fans because they can shift more heat/air away at slower (ie quieter) speeds. Cases that can handle 2x140mm coolers often handle up to 3x120mm coolers too, so that's the case I've got now. After that, upgrades are motherboardCPU combos, sometimes including RAM. Doing it piecemeal like this makes it easier for me to finance over time and has provided a lot more flexibility over time. I don't have an SSD on my main PC yet, but I've installed a 2.5" one on our work PC and one of those M.2 drives on another, both of which are very much worth the cost |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,118,186 RAC: 6,004 |
I have 5 of them running!! All of mine are dual quad core Xeon cpu's meaning 16 cpu cores thru Hyper-threading for each pc. As others have said unless they come with a big enough psu they can't really use the gpu too as replacement psu's are VERY expensive and proprietary. Mine all run at 2.5ish ghz some more some less but they flat bang out cpu workunits over the course of 24 hours. Mine are not here at Rosetta they are at TN-Grid right now and the units there take about 4 to 6 hours each depending on the machine. I bought them on Ebay without harddrives or gpu's but they did come with @20gb of memory in each one. I put Win7 on each one and then auto upgraded to Win10 but some of the Win10 upgrades do not like the older pc's vry much, some work just great but some updates hang. I have all of them running Linux Mint at the moment but still have the Win10 harddrives sitting on a shelf. They boot up VERY slowly on a standard sata drive so I put 240gb SSD drives in each one and they boot and run just fine. I have no paid software on any of them as they are strictly Boinc machines and all they do is crunch 24/7, that means periodic updates for a/v software and Windows updates, which is partly why they run Linux right now. When I bought mine they werer about $150 to $250 each plus shipping thru Ebay and they are BIG BOXES!! Look up HP Z600 cases!!! I just looked on Ebay and they are in the $350US and up range now. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 3,846 |
The moderator appears to have been absent in the moderator contact thread for almost 3 months. The TFSCAFFOLD0001 tasks are still often failing, but now have an extra line near what's probably the point of failure: BOINC :: WS_max 0 No obvious meaning to me, but hopefully it's more meaningful to the person who set up these tasks. |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
‘WS’ might be working set |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
The rgmjp tasks appear to complete only one decoy. The first decoy is usually only a quick check to make sure that your computer is running properly, so does this mean that the usual first decoy is skipped for these, or does it mean that more decoys are done but without adding them to the decoy count? You are mistaken about the first decoy. The first decoy is a legit, full model of the protein, not a simple test of the environment. Rosetta Moderator: Mod.Sense |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,716,372 RAC: 18,198 |
I have 5 of them running!! All of mine are dual quad core Xeon cpu's meaning 16 cpu cores thru Hyper-threading for each pc. As others have said unless they come with a big enough psu they can't really use the gpu too as replacement psu's are VERY expensive and proprietary. Actually, I have two dual Xeon machines, the PSUs were only £20 each. Genuine Dell supplies, 2nd hand on Ebay. I could have used a normal ATX supply, but I couldn't find the weird pinouts for the non-standardly wired ATX connectors. I didn't plug a normal supply in when I noticed all the yellow wires were at one end, instead of randomly scattered like a normal ATX plug. It makes more sense to have all of each voltage together for the tracks on the motherboard, but I guess ATX plugs have been added to over the years. You can certainly get a lot for your money buying used workstations and servers. Bear in mind that cheap to buy can mean expensive to run – the older the machine, the slower and less energy-efficient. Unless you want to use the computers for heating, the very cheapest machines are likely to be a false economy. The newer ones certainly use less power, but they cost a lot more. I guess the best thing is to add up the electricity cost and the parts cost and see what's cheapest per FLOP over the next few years. |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
cheapest per FLOP over the next few years.Fairly thorough recent discussion on that topic in The most efficient cruncher rig possible |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,716,372 RAC: 18,198 |
cheapest per FLOP over the next few years.Fairly thorough recent discussion on that topic in The most efficient cruncher rig possible I use a spreadsheet and insert all the CPUs and GPUs available on Ebay. Well not all, just the 30 most common ones. You can insert cost of CPU, motherboard, RAM, electricity, etc. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2115 Credit: 41,115,238 RAC: 19,699 |
Not a problem or technical issue with the website, but a little information After the task outage last month I guess people re-prioritised other projects, understandably. Since tasks became available, hosts returning tasks, which had dropped from 750kday to 200kday and are now back up to 500kday. The current number of tasks queued waiting to run is now up at 17 million - the highest number I can ever recall seeing. 6-7m was usual So if you've switched some hosts away while we were out of work, now is exactly the time to bring them back. Pass it on. Now I'm back at work after lockdown (since a month ago) I've added back 2 PCs and I discovered Android Rosetta tasks are now back working on my phone (v7.4.53) And, for what little it's worth, I've added ~2.2% to the overclock on my 2 main PCs over the last couple of days - every little helps |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 3,846 |
cheapest per FLOP over the next few years.Fairly thorough recent discussion on that topic in The most efficient cruncher rig possible Any discussion of how much faster memory helps? About all I've been able to find so far us that at least for Rosetta@home, it does help. If it makes a difference, my computer uses DDR4 memory. |
Rayalot72 Send message Joined: 27 Mar 20 Posts: 2 Credit: 237,615 RAC: 1,215 |
Having some sort of issue where multiple instances of rosetta_4.20_windows_x86_64.exe are open seemingly independent of BOINC. Closing BOINC, suspending the project, etc. don't get rid of these processes. They're also taking up a massive chunk of CPU, despite me having set BOINC to use 30% CPU time. What are these? |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1993 Credit: 9,520,400 RAC: 11,365 |
Any discussion of how much faster memory helps? About all I've been able to find so far us that at least for Rosetta@home, it does help. Any discussion of how much faster HD helps? I see an great difference in rac between my "old" sata and new SSD (with the same memory and cpu). |
Keith Myers Send message Joined: 29 Mar 20 Posts: 97 Credit: 332,141 RAC: 1,223 |
I seem to generate 50% errors on tasks. Only running one task at a time. No other projects have any appreciable errors. 139 (0x0000008B) Unknown error code and nothing else of much use in output except got signal 11. <core_client_version>7.17.0</core_client_version> <![CDATA[ <message> process got signal 11</message> <stderr_txt> command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu -beta -frag3 00001.200.3mers -frag9 00001.200.9mers -abinitio::increase_cycles 10 -mute all -abinitio::fastrelax -relax::default_repeats 5 -abinitio::rsd_wt_helix 0.5 -abinitio::rsd_wt_loop 0.5 -abinitio::use_filters false -ex1 -ex2aro -in:file:boinc_wu_zip fp200714_fbfb_pair46_X_4_f_e0_239_X_0001_0001_rlx_fragments_fold_data.zip -abinitio::rg_reweight 0.5 -out:file:silent default.out -silent_gz -mute all -in:file:native 00001.pdb -out:file:silent_struct_type binary -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 2057750 Using database: database_357d5d93529_n_methyl/minirosetta_database </stder None have been the scaffold task recently talked about in the thread. Anybody have any ideas? |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 3,846 |
Having some sort of issue where multiple instances of rosetta_4.20_windows_x86_64.exe are open seemingly independent of BOINC. Closing BOINC, suspending the project, etc. don't get rid of these processes. They're also taking up a massive chunk of CPU, despite me having set BOINC to use 30% CPU time. What are these? rosetta_4.20_windows_x86_64.exe is the program that actually does the work for the current Rosetta@home tasks. They NEED a lot of CPU to do their work. A separate copy is needed for each of the Rosetta@home tasks currently running on your computer. As for the 30% CPU time setting, I've thought of two possibilities: 1. This program ignores that setting. 2. The way of timing that setting is inside the program, and does not release the CPU. Does anyone else here know which, if either, of these possibilities is correct? |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 3,846 |
I seem to generate 50% errors on tasks. Only running one task at a time. No other projects have any appreciable errors. Signal 11 under Linux is also known as segmentation fault. It means that the program tried to access some memory location that the program should not have had access to. This is often because the program wrote something other than a memory address to a location that was supposed to contain the memory address of something it should have been able to reach. Another cause is that the program tried to use something as the address of something it could reach, without ever setting that supposed address to anything, and therefore using whatever was there before the program started as a memory address. This is not something you can fix - it needs to be done by someone with more knowledge of the internals of the program. Do we have a moderator here who can tell a developer to try that workunit with more debugging enabled, in order to see more about what went wrong? You might be able to help the developer by posting a pointer to at least one of these failed tasks. |
Keith Myers Send message Joined: 29 Mar 20 Posts: 97 Credit: 332,141 RAC: 1,223 |
Well, the first set of errors I assume were caused by changing my runtime preferences for tasks midstream while they were running. Seems you shouldn't change from 4 hours to 8 hours and back to 4 hours. But all the recent errors were on tasks downloaded with the 4 hour runtime from the start. https://boinc.bakerlab.org/rosetta/result.php?resultid=1222334855 https://boinc.bakerlab.org/rosetta/result.php?resultid=1222430421 https://boinc.bakerlab.org/rosetta/result.php?resultid=1222469090 https://boinc.bakerlab.org/rosetta/result.php?resultid=1222561787 https://boinc.bakerlab.org/rosetta/result.php?resultid=1222594710 This task has an invalid pointer error. https://boinc.bakerlab.org/rosetta/result.php?resultid=1221693374 000067bd783 *** *** Error in `../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu': free(): invalid pointer: 0x00000000067bd783 *** Same with this task. https://boinc.bakerlab.org/rosetta/result.php?resultid=1222039913 [Edit] Doing some Googling, I come up with the program tried to get some memory allocated but did not do it properly. |
Rayalot72 Send message Joined: 27 Mar 20 Posts: 2 Credit: 237,615 RAC: 1,215 |
It'a w/e, I've fixed it. Suspended tasks on BOINC, killed every Rosetta task I could find, resumed Rosetta. I've had BOINC for quite a while now, it's not going to just suddenly ignore my CPU settings. It looks more to me like BOINC had some rogue tasks for unknown reasons. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,716,372 RAC: 18,198 |
After the task outage last month I guess people re-prioritised other projects, understandably. I really don't understand why people do that. I have all my computers set to run at least two projects. If one goes wrong or runs out of work, it will run entirely the other one with no intervention from myself. When it's fixed, it'll go back to doing it at the proportion I've set (and in fact tries to make up lost ground by doing more of the one that was broken for a while. You could even have Rosetta at weight 1,000,000 and another project at 1. Having some sort of issue where multiple instances of rosetta_4.20_windows_x86_64.exe are open seemingly independent of BOINC. Closing BOINC, suspending the project, etc. don't get rid of these processes. They're also taking up a massive chunk of CPU, despite me having set BOINC to use 30% CPU time. What are these? 30% of cores works better. Then you run less tasks at full speed instead of lots of tasks slower. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
30% of cores works better. Then you run less tasks at full speed instead of lots of tasks slower. With hyper threading, each full core is divided into two "virtual" cores, each with its own instruction stream, referred to as a "thread". That allows the hardware to be used more efficiently, so that it is idle less of the time, but each of the two threads runs more slowly than if only one were used per core. Typically, you get about 30% greater output using 100% of the cores than when using only 50%, even though each work unit runs faster in the latter case. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org