Message boards : Number crunching : Minirosetta 3.73-3.78
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 14 · Next
Author | Message |
---|---|
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Now 99% of total cpu's 5 dedicated cpu and 1 to work with GPU and other stuff. Percent of time cpu is back to 100% [quote]There is no difference between 99% and 96% of CPUs in the computing configuration of your machine. Any minor change was likely due to background churning of other jobs ... either normal system tasks or other Boinc compute jobs. There are two BOINC COMPUTING PREFERENCES -> COMPUTING controls for the CPU. One is "% of CPUs" which controls the number of CPUs that are active. Second is "% of CPU time" which intentionally inserts idle into the compute time. Use "% of CPUs" and AVOID the "% of CPU time" like the plague. Inserting non-BOINC time into the project execution is like what you saw with Rosetta running at 8%. Your 8% was like setting the "% of CPU time" at 50%. The "% of CPUs" deals in whole CPUs. "% of CPUs" set to 99% will allow 5 of your 6 CPU to run CPU only jobs. You can drop "% of CPUs" down to 100% - 1/6 = 83.4% and it should still allow 5 of your CPUs to run. If you set "% of CPUs" to 83%, then BOINC will idle the second CPU and only 4 would run. EXAMPLE: On my i7 with 8-CPUs, setting "% of CPUs" to 99% disables 1 CPU ... and displays the following message in the EVENT LOG: 5/10/2016 6:00:32 AM | | Number of usable CPUs has changed from 8 to 7. 5/10/2016 6:00:32 AM | | max CPUs used: 7 Setting "% of CPUs" to 88% yields the same message. Setting "% of CPUs" to 87% drops another CPU with the EVENT LOG message: 5/10/2016 6:02:32 AM | | Number of usable CPUs has changed from 7 to 6. 5/10/2016 6:02:32 AM | | max CPUs used: 6 [quote]Ok, I will lower my overal Boinc CPU load to 98% and see if that helps. And what you see on POEM is the same with me. 100% GPU and grabbing a significant percent of CPU. So it could be like you said, Rosetta getting bounced. - Lowered both levels of processor usage to 96%. Will let things run and see if that helps Rosie catch back up. Thanks for the help. Let you know later if that solves the issue. [quote]If the Rosetta job is bouncing between 16% and 8%, the CPU caches are getting cleared out during the 8% time that Rosetta is being idled by other programs executing on your system. You cannot tell how many times Rosetta is getting/losing control during that 1 second sample but it is probably a large number of times. This is a very good indication that CPU cache thrashing (two or more jobs wanting to have their code/data in CPU caches) is a problem. Since the Boinc Whetstone benchmark ran full speed on your machine and other user machines, when Rosetta bounces between 16%-8% and cache contents are evicted, your machine is not making as much Rosetta compute progress because it is waiting for code/data to be retrieved again from slower main memory. When compared to the other machine ratios of Rosetta/Whetstone, their ratio is higher than yours appears and they are getting a higher % of claimed credits. It is hard to estimate the exact impact based on these high level numbers but if you saw 8% on Rosetta, that is not good and likely part of the problem. I have seen the GPU job load on the CPU vary as a function of the SYSTEM and as a function of the GPU, CPU and memory bandwidth. POEM is taking 100% of a CPU on my i7-3770k/Nvidia 970 GPU. The newer OpenCL GPU apps do seem to take a good chunk of a CPU. They take more CPU than their CUDA counterparts. On machines that I run POEM or similar OpenCL GPU projects, I set the : BOINC -> COMPUTER PREFERENCES -> USAGE LIMITS -> % of CPUs = 99% to keep 1 CPU available for the GPU jobs AND for reasonable response on the system. [quote]It might be Poem. Even though it is GPU mainly it grabs .263% of the CPU but when looking at processes it takes 17% of the CPU and Rosetta jumps around between 16 and 8%. [quote]I am not sure about VHC or its control knobs. I have looked at SixTrack source code many years ago and had a SixTrack (LHC@Home) account but they could not generate work to crunch, so I gave up. I have also never ran a VirtualBox version of any project, so have no experience there either. If the app runs under BOINC control, you can set the PROJECT->NO NEW TASKS and let the tasks drain out or simply suspend the VLHC project application for a period and see if it makes a difference in the Rosetta results. A quick examination of the Windows 10 task manager might tell: TASK MANAGER -> MORE DETAILS -> PROCESSES screen should tell you a lot. The CPU column should total close to 100% if you allow all CPU to be busy. SORT BY CPU by clicking on the CPU column. The Rosetta jobs should be consuming 1/6 (one of your 6 CPU) or 16.6% of the machine. If they are consuming noticeably less than 16.6% then that means the Rosetta job is not running 100% of the time, the Rosetta code and data is being evicted from the L1/.../Lx CPU caches. It takes a few cycles for the CPU to get data from those near caches. If the CPU has to go to main memory for evicted code/data, it takes 10x that long and Rosetta will run but VERY inefficiently while it waits for code/data to warm the caches again. Rosetta works hard but is waiting on code/data from memory. It is worth your time to run a couple experiments on your machine to see if anything is affecting progress.. [quote]You think that VHC could be interfering? They both seem stuck on low average credit and VHC runs on 24 time slots. You can not alter the run time on that project. Since I have been on Rosetta longer than VHC, I may have to drop VHC. I was trying it because I wanted to see how virtual box worked. [quote][quote]Is my CPU not strong enough for the current tasks that have been running |
Dr. Merkwürdigliebe Send message Joined: 5 Dec 10 Posts: 81 Credit: 2,657,273 RAC: 0 |
@Greg_BE: You might want to learn how those nasty quote tags work... |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
@Greg_BE: You might want to learn how those nasty quote tags work... I think that is because I am writing above the previous post instead of below like here. The computer for the forum can't read backwards. I haven't posted on here in years so I have forgotten how this works. |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 22,989,157 RAC: 15,740 |
You can put messages above the old message .... AND I thought that was a clever idea since it worked for me. @Greg_BE: You might want to learn how those nasty quote tags work... |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
we are going way off topic now. so time to end this. You can put messages above the old message .... AND I thought that was a clever idea since it worked for me. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1993 Credit: 9,520,400 RAC: 11,365 |
825724761 ERROR: in::file::boinc_wu_zip 5H2LD-13_tj58_5_054307_0014_I_0001_data.zip does not exist! ERROR:: Exit from: ......srcappspublicboincminirosetta.cc line: 226 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
825724761 I see the same error: seems to happen with all tasks named yh_*. Boinc 7.2.42/Ubuntu 14.04 |
Dr. Merkwürdigliebe Send message Joined: 5 Dec 10 Posts: 81 Credit: 2,657,273 RAC: 0 |
Yes, all of the yh* jobs are failing on my computer, too. |
krypton Volunteer moderator Project developer Project scientist Send message Joined: 16 Nov 11 Posts: 108 Credit: 2,164,309 RAC: 0 |
Thanks for the report! I've contacted the authors of these jobs! Yes, all of the yh* jobs are failing on my computer, too. |
yhsia Volunteer moderator Project developer Project scientist Send message Joined: 21 May 16 Posts: 2 Credit: 97,172 RAC: 0 |
Thanks for the report! I've contacted the authors of these jobs! Sorry those were my jobs! Apologizing for the wasted run times, I'm figuring out what went wrong :(. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2115 Credit: 41,115,238 RAC: 19,699 |
I got several of the above, so no need to report them, but another isolated one came up: 4hi0_B_16_BEN_SUP_hyb_cst_v02_i00_t000__krypton_SAVE_ALL_OUT_03_09_358432_163_1 ERROR: Cannot open file "i11.pdb" Exited after just 50 seconds, so no harm done at my end Oh, also another odd one that seemed to run ok, but claimed credit yet received 0 but without a validate error rb_05_17_65554_109652__t000__ab_robetta_IGNORE_THE_REST_358733_4959_1 ====================================================== |
krypton Volunteer moderator Project developer Project scientist Send message Joined: 16 Nov 11 Posts: 108 Credit: 2,164,309 RAC: 0 |
Thanks Sid! I've fixed the issue, but unfortunately some units already got sent out =[ I got several of the above, so no need to report them, but another isolated one came up: |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2115 Credit: 41,115,238 RAC: 19,699 |
Good stuff. The one with the credit problem seems to have got cleaned up in the meantime and granted credit equal to claimed credit, so all's well there too. A few more failed tasks but only of the type already reported, so all in hand as they work their way out of the queue. Thanks Sid! I've fixed the issue, but unfortunately some units already got sent out =[ |
Dr. Merkwürdigliebe Send message Joined: 5 Dec 10 Posts: 81 Credit: 2,657,273 RAC: 0 |
|
krypton Volunteer moderator Project developer Project scientist Send message Joined: 16 Nov 11 Posts: 108 Credit: 2,164,309 RAC: 0 |
Compute error Thanks! I've informed the author of the job. |
wyxchari Send message Joined: 27 Nov 14 Posts: 11 Credit: 85,318 RAC: 0 |
I'm tired of computer errors Rosetta. Many tasks fail at the end and then not receive credit. I prefer to use my computer time on other projects as WCG Cancer never give me error. Goodbye forever. |
Timo Send message Joined: 9 Jan 12 Posts: 185 Credit: 45,649,459 RAC: 0 |
I'm tired of computer errors Rosetta. Many tasks fail at the end and then not receive credit. I prefer to use my computer time on other projects as WCG Cancer never give me error. Goodbye forever. Actually, your error'd task recieved full credit (See bottom of page here: https://boinc.bakerlab.org/rosetta/result.php?resultid=824626167) As with most invalid tasks, there is a job that grants credit to invalid jobs once a day as they don't get credit right away, and this granted credit only shows on the result summary page. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2115 Credit: 41,115,238 RAC: 19,699 |
A new error report for yhsia to look at - cuts in at 30-45 minutes into the tasks for some reason: yh160603_5H2LD-13-R_tj59_5_043651_0011_E_0001_SAVE_ALL_OUT_377694_12_1 <message> yh160603_5H2LD-13-R_tj58_5_000001_0002_C_0001_SAVE_ALL_OUT_377666_89_0 <message> |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2115 Credit: 41,115,238 RAC: 19,699 |
Also yh160603_5H2LD-13-R_tj59_5_000001_0001_E_0001_SAVE_ALL_OUT_377690_131_1 <message> |
krypton Volunteer moderator Project developer Project scientist Send message Joined: 16 Nov 11 Posts: 108 Credit: 2,164,309 RAC: 0 |
Hi Sid, Thanks for the alert! looks like these jobs require lots of memory. We have a way to specify how much memory to use. It will corrected in the next round of submission! |
Message boards :
Number crunching :
Minirosetta 3.73-3.78
©2024 University of Washington
https://www.bakerlab.org