Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 116 · 117 · 118 · 119 · 120 · 121 · 122 . . . 300 · Next
Author | Message |
---|---|
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
ah ha! FAH is the culprit. OK....taking that off CPU then. I have an interest in so many things, I guess I will have to eliminate a few. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
I don't know how FAH got ahold of my CPU, maybe after an update or something. Anyway it's eliminated from CPU now. Thanks for the reminder. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,582,638 RAC: 22,530 |
I don't know how FAH got ahold of my CPU, maybe after an update or something.If you are still doing FAH on the GPU, you'll need to check how much CPU time is required to support the GPU Tasks and then reduce the number of CPU cores/threads BOINC can use to stop overcommitting the CPU. Bring up Task Manager/Process Explorer and see how much CPU time is being used by the Folding@home GPU application. If it needs 1 CPU core/thread per Task, and you're running 4 GPU Tasks then in your Account, Computing preferences, "Use at most xxx % of the CPUs" should be set to 75%. If you're running only 2 GPU Tasks, and they only need 0.5 CPU cores/threads to support them, then set "Use at most xxx % of the CPUs" to 7%. 2 GPU Tasks, 1 CPU Core/Thread to support them then set it to 13%, etc, etc. Save those changes, then the next time BOINC contacts the Scheduler for the project you made the changes in (or you select it and hit Update in the BOINC Manager) and then the changes will take effect. While Folding@home doing CPU work is probably having the greatest effect, Reserving a CPU/Core thread for the Moowrapper (and any other GPU projects you do) will also be necessary to stop the CPU from being overcommitted and result in CPU time & Runtimes becoming almost identical. Grant Darwin NT |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I don't know how FAH got ahold of my CPU, maybe after an update or something. I usually use just the GPU on Folding, and delete the CPU slot. But it takes more than one shot. After the first reboot, it comes back. If I delete it two or three times, it usually gets the message and stays deleted. But occasionally it comes back from the dead even after that; maybe due to an update. It does its own thing. |
Tomcat雄猫 Send message Joined: 20 Dec 14 Posts: 180 Credit: 5,386,173 RAC: 0 |
I've being getting quite a lot of these errors on all my devices. Sometimes they get validated, sometimes they just result in a computational error. pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_6nr4xe1m_1389982_5_1 <core_client_version>7.16.16</core_client_version> <![CDATA[ <stderr_txt> command: ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_arm-android-linux-gnu -run:protocol jd2_scripting -parser:protocol pre_helix_boinc_v1.xml @helix_design.flags -in:file:silent pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_6nr4xe1m.silent -in:file:silent_struct_type binary -silent_gz -mute all -silent_read_through_errors true -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_6nr4xe1m.zip @pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_6nr4xe1m.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3400220 Extracting in project directory: database_357d5d93529_n_methyl.zip Using database: database_357d5d93529_n_methyl/minirosetta_database ERROR: [ERROR] Unable to open constraints file: f9b5372889da3e02c19de86d067b31e6_0001.MSAcst ERROR:: Exit from: src/core/scoring/constraints/ConstraintIO.cc line: 457 called boinc_finish(0) </stderr_txt> ]]> pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_2ke5kf4f_1389895_5_1 <core_client_version>7.16.11</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1)</message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -run:protocol jd2_scripting -parser:protocol pre_helix_boinc_v1.xml @helix_design.flags -in:file:silent pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_2ke5kf4f.silent -in:file:silent_struct_type binary -silent_gz -mute all -silent_read_through_errors true -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_2ke5kf4f.zip @pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_2ke5kf4f.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1087567 Using database: database_357d5d93529_n_methylminirosetta_database ERROR: [ERROR] Unable to open constraints file: f5d52c1749a40719598f1d8b37e13c45_0001.MSAcst ERROR:: Exit from: ......srccorescoringconstraintsConstraintIO.cc line: 457 BOINC:: Error reading and gzipping output datafile: default.out 10:42:25 (29468): called boinc_finish(1) </stderr_txt> ]]> Funny to note that my Snapdragon 888 is beating my Ryzen 5 3600 and gets much more consistent credits per task. My phone is getting around 394 credits for every 8hr task it gets, My Ryzen 5 3600 gets a measly 346 credits per 8 hour task. Either something is up with the credit calculation, or there is something about these pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST tasks that make them particularly good on ARM. I'm leaning towards the former explanation because: 1) the benchmarks for my phone are 5887.99 million ops/sec floating point speed and 29296.86 million ops/sec integer speed , whilst my 3600 gets 5198.63 million ops/sec floating point speed and 19515.05 million ops/sec integer speed. 2) My 3600 seems to be consistently generating more "decoys" than my phone despite the credit deficit. Probably a BOINC issue that's been beaten to death already. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,582,638 RAC: 22,530 |
I've being getting quite a lot of these errors on all my devices. Sometimes they get validated, sometimes they just result in a computational error.Those errors have been occurring with some pre_helical_bundles_ Tasks ever since they were released months ago. Grant Darwin NT |
Tomcat雄猫 Send message Joined: 20 Dec 14 Posts: 180 Credit: 5,386,173 RAC: 0 |
Understood, thanks! |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,523,781 RAC: 8,309 |
Funny to note that my Snapdragon 888 is beating my Ryzen 5 3600 and gets much more consistent credits per task. My phone is getting around 394 credits for every 8hr task it gets, My Ryzen 5 3600 gets a measly 346 credits per 8 hour task. Either something is up with the credit calculation, or there is something about these pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST tasks that make them particularly good on ARM. I'm leaning towards the former explanation because: 1) the benchmarks for my phone are 5887.99 million ops/sec floating point speed and 29296.86 million ops/sec integer speed , whilst my 3600 gets 5198.63 million ops/sec floating point speed and 19515.05 million ops/sec integer speed. 2) My 3600 seems to be consistently generating more "decoys" than my phone despite the credit deficit. Probably a BOINC issue that's been beaten to death already. The long long story of "code optimization"... |
Tomcat雄猫 Send message Joined: 20 Dec 14 Posts: 180 Credit: 5,386,173 RAC: 0 |
On WCG, my 3600 is many times faster than my snapdragon 888, as it should be. Given how my old family iMac gets a measured floating point speed of 6471.51 million ops/sec and a measured integer speed of 21013.4 million ops/sec, which would make it about as fast as an Intel 10700K, I think BOINC's benchmark is extremely unreliable between platforms. My iMac is nowhere nearly as fast as my 3600 in BOINC. The problem is that BOINC's benchmark appears to play a large role in the calculation of credits for Rosetta. My phone gets around 394 credits for each 8 hour task, 394/5.89 (the peak measured floating point speed) is 66.89, my 3600 gets around 346 credits for similar tasks, 346/5.20 is 66.54. My old Macbook Pro used to have the same issue, it was getting too many credits compared to my 3600. It's important to note that in the helical tasks, my 3600 actually appears to be doing more work (more decoys generated), it's just being granted less credits. If there is actually a bug, do I want it to be fixed? Yes. Do I think it is an issue Rosetta should focus on? No. It's difficult to solve such issues with Rosetta@home because of how unique it is. Other projects have tasks that are of a known computation size, the speed of your device determines the amount of time it takes to complete the tasks. On Rosetta, the run-time of each task is the the same, the speed of your device determines how much work gets done. Determining how much work was actually done is difficult. BOINC's benchmark is extremely unreliable across platforms. According to the admin, this is how things are supposed to work: "What is the average processing rate? And what is it used to calculate? Related to your initial question, R@h credits are based on a rolling average of claimed credit/model that our server keeps track of for each job set. So you are awarded the average credit per model that previous results have reported back multiplied by the number of models you produced. The first job reported back for a specific job set is granted the claimed credit, but subsequent results are granted the average credit per model * the number of models produced. This works quite well for large job sets (like for protein structure prediction) because a job set may consist of thousands of jobs which eventually gives a good average credit per model value. But it can be a bit variable for small job sets, and if a job set consists of only one job, then you will be granted your claimed credit which is based on the cpu benchmark and run time." Given the fact my 3600 is consistently generating more models than my phone (170-180 vs 140-150) in the pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST tasks, I expect it to get more credits. There is also the possibility that the tasks my phone is getting have more complex models and thus gets more credits per model. Here's another example of Rosetta's credit calculation being a little wonky: Possible bug in the "Average Processing Rate" calculation |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 3,155 |
[snip] It's important to note that in the helical tasks, my 3600 actually appears to be doing more work (more decoys generated), it's just being granted less credits. Note that not all decoys do the same amount of work, so less credit per decoy is not necessarily meaningful. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_6fa0fn4c_1391037_5_0 https://boinc.bakerlab.org/rosetta/result.php?resultid=1411691662 <core_client_version>7.16.11</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1)</message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -run:protocol jd2_scripting -parser:protocol pre_helix_boinc_v1.xml @helix_design.flags -in:file:silent pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_6fa0fn4c.silent -in:file:silent_struct_type binary -silent_gz -mute all -silent_read_through_errors true -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_6fa0fn4c.zip @pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_6fa0fn4c.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3747228 Using database: database_357d5d93529_n_methylminirosetta_database ERROR: [ERROR] Unable to open constraints file: c0d44a74319eb5077ecc3d522c246773_0001.MSAcst ERROR:: Exit from: ......srccorescoringconstraintsConstraintIO.cc line: 457 BOINC:: Error reading and gzipping output datafile: default.out 20:45:39 (11432): called boinc_finish(1) </stderr_txt> ]]> |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,582,638 RAC: 22,530 |
[snip]And the benchmarks are used to determine the amount of work done (there is no actual measurement of work actually done as such...). So systems with higher benchmark values get more Credit per hour than those with lower benchmark values (unless they are too much higher in which case it is determined they are cheating and Credit awarded becomes some sort of average based on all the work returned to date for such Tasks). And that is why systems that haven't run the benchmarks and are using the default values get bugger all Credit for the work they return until such time as they eventually do run the benchmarks. Grant Darwin NT |
Tomcat雄猫 Send message Joined: 20 Dec 14 Posts: 180 Credit: 5,386,173 RAC: 0 |
[snip]And the benchmarks are used to determine the amount of work done (there is no actual measurement of work actually done as such...). I've figured that out the hard way. I just find it annoying that the values are so inconsistent across platforms. I assume these validate errors are also a known issue with the pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST tasks and not some stupid problem with my machine? pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_3db8ur8d_1391015_5_0 pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_5be2mt7m_1391005_5_0 |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,582,638 RAC: 22,530 |
I assume these validate errors are also a known issue with the pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST tasksYep. Been occurring ever since they were first released. You get periods where you see hardly any of them, and then you get batches where as many as 20% could give a Validate or Computation error, either after several hours, or just after a few seconds. We're back to more than the usual number of Tasks giving issues at present. Although given the number of Tasks left queued (2 million & falling), we'll probably be out of all work in the next few days unless a another big batch of work is released before then. Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,135,495 RAC: 16,312 |
I've being getting quite a lot of these errors on all my devices. Sometimes they get validated, sometimes they just result in a computational error.Those errors have been occurring with some pre_helical_bundles_ Tasks ever since they were released months ago. Yes. They all run very short too. The ones that come up validated award appropriate credits for the time. Only the very shortest-running come up computation error with no credit, but usually run less than 20 seconds. They "cost" a little in download time, but it's easier to let them error out than to fix so it'll continue occasionally until they're exhausted - won't be much longer now Edit: I looked at this in late-May and reported it to admin, who gave the answer above I've finally got round to examining this issue involving "ERROR:: Exit from: ......srccorescoringconstraintsConstraintIO.cc line: 457" |
mrchips Send message Joined: 11 Nov 09 Posts: 10 Credit: 14,602,425 RAC: 20,724 |
Scheduler bwsrv1 Not Running whats up? |
mrhastyrib Send message Joined: 18 Feb 21 Posts: 90 Credit: 2,541,890 RAC: 0 |
This is probably an issue for the BOINC board, but since I know you jamokes I will ask here first. On one of my hosts, the BOINC manager will not launch. I try to run it as I have in the past, but nothing happens at all. The host seems to be processing work, based on the CPU usage. But no joy with the manager program. Any suggestions? I suppose that I can uninstall/reinstall; will that disturb anything? |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 389 Credit: 12,070,320 RAC: 10,319 |
This is probably an issue for the BOINC board, but since I know you jamokes I will ask here first. Look for a file in your home directory, 5 bytes long and called Boinc-Manager-xxx or similar. If you find it, delete it - it’s a lock file. |
mrhastyrib Send message Joined: 18 Feb 21 Posts: 90 Credit: 2,541,890 RAC: 0 |
This is probably an issue for the BOINC board, but since I know you jamokes I will ask here first. |
mrhastyrib Send message Joined: 18 Feb 21 Posts: 90 Credit: 2,541,890 RAC: 0 |
Look for a file in your home directory, 5 bytes long and called Boinc-Manager-xxx or similar. If you find it, delete it - it’s a lock file.[/quote] Five star review, my man. Much obliged. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org