Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 24 · 25 · 26 · 27 · 28 · 29 · 30 . . . 300 · Next
Author | Message |
---|---|
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 389 Credit: 12,070,320 RAC: 12,300 |
Ok, thanks. It was worth asking :-) |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 22,989,157 RAC: 15,740 |
OK, extra memory ordered for both machines so we’ll see if that sorts it. It seems like Rosetta gets into a state where it consumes 1gb+ per WU. I am running 35 WU and there is always a couple taking over a gb. I watch the difference between CPU and RUN times and swap used. As long as the swap used is very low, you are probably not running into memory problems. I tend to buy more GB of memory than threads. I originally got my 36 thread machine with 32GB and that was not enough. You can see that 19gb of my swap space has been used even though the machine has 64gb installed for the 36 threads. 19gb swap space used is concerning. Based on over a thousand jobs each, the credit difference between the 64-bit Rosetta WU and Minirosetta 32-bit WU is negligible. 44.0 credits/CPU hr for Rosetta 4.08 and 45.7 credits/CPU hr. top ic .... sorted by memory use. top - 10:55:55 up 1 day, 18:24, 0 users, load average: 40.22, 36.72, 36.27 Tasks: 524 total, 37 running, 487 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.4 us, 1.4 sy, 96.6 ni, 1.1 id, 0.0 wa, 0.4 hi, 0.1 si, 0.0 st MiB Mem : 64090.7 total, 1051.2 free, 16283.5 used, 46756.0 buff/cache MiB Swap: 32112.0 total, 32093.0 free, 19.0 used. 45874.0 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 24590 boinc 39 19 1722808 1.5g 75400 R 98.3 2.5 219:01.25 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -run:protocol jd2_scripting @flags_rb_03_15_1798_1948__t000__1_C1+ 25349 boinc 39 19 1384300 1.2g 75400 R 99.3 1.9 198:57.60 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -run:protocol jd2_scripting @flags_rb_03_15_1798_1948__t000__1_C1+ 22988 boinc 39 19 838204 723668 75400 R 97.7 1.1 259:32.35 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -run:protocol jd2_scripting @flags_rb_03_14_1536_1929__t000__0_C1+ 24878 boinc 39 19 706928 592640 75784 R 99.3 0.9 211:30.53 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -run:protocol jd2_scripting @flags_rb_03_15_1805_1950__t000__0_C1+ 15222 boinc 39 19 605140 491200 76104 R 99.0 0.7 459:54.12 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -run:protocol jd2_scripting @flags_rb_03_15_1674_1946__t000__0_C1+ 20625 boinc 39 19 605492 491108 75400 R 99.3 0.7 319:46.33 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -run:protocol jd2_scripting @flags_rb_03_15_1808_1947__t000__0_C1+ 16439 boinc 39 19 583112 468876 75784 R 97.4 0.7 428:23.20 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -run:protocol jd2_scripting @flags_rb_03_15_1674_1946__t000__0_C1+ 24082 boinc 39 19 583664 465920 68044 R 99.3 0.7 231:28.63 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu @flags_rb_03_15_1805_1950__t000__ab_robetta -in:file:boinc_wu_zip+ 17334 boinc 39 19 575680 457680 68620 R 99.3 0.7 404:59.21 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:+ 22280 boinc 39 19 543464 425512 68556 R 99.7 0.6 276:44.21 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:+ 19901 boinc 39 19 533536 415428 68556 R 99.7 0.6 338:55.09 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:+ 22209 boinc 39 19 530860 413260 68236 R 99.3 0.6 278:15.90 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu @flags_rb_03_15_1808_1947__t000__ab_robetta -in:file:boinc_wu_zip+ 25711 boinc 39 19 523612 408668 70668 R 99.3 0.6 190:02.19 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu @foldit_2007571_0001_fold_and_dock_flags -silent_gz -mute all -ou+ 21481 boinc 39 19 521072 406132 70604 R 99.3 0.6 297:12.91 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu @foldit_2007571_0005_fold_and_dock_flags -silent_gz -mute all -ou+ 17873 boinc 39 19 516024 398184 68620 R 99.3 0.6 391:55.17 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:+ 15374 boinc 39 19 511956 394116 68556 R 99.3 0.6 455:50.04 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:+ 30825 boinc 39 19 509260 391232 68620 R 99.3 0.6 78:42.82 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:+ 14998 boinc 39 19 508228 390160 68620 R 98.0 0.6 465:27.01 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:+ 18209 boinc 39 19 503324 385500 68620 R 99.0 0.6 383:28.39 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:+ 31538 boinc 39 19 500516 382744 68620 R 99.3 0.6 60:22.53 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.08_x86_64-pc-linux-gnu -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:+ |
Juha Send message Joined: 28 Mar 16 Posts: 13 Credit: 705,034 RAC: 0 |
19gb swap space used is concerning. 19 GB would indeed be a lot of swap in use but haven't you got the unit wrong? It looks like 19 MB to me. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2115 Credit: 41,109,659 RAC: 19,800 |
You may need more memory. You have 8 GB on your Ryzen, but the Rosetta work units sometimes take up to 1 GB each. Sorry to be a bit late on this, but I did notice around 13th March I had a task consuming 2.4Gb and 14Gb of my 16Gb (total) RAM being in use to run 8 tasks. I can't recall the tasks involved. Right now I'm back to my more usual level of 7.74Gb in use |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 22,989,157 RAC: 15,740 |
19gb swap space used is concerning. DOH! You are obviously correct. I got units of GB dancing in my head. |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 389 Credit: 12,070,320 RAC: 12,300 |
As of this AM I have 16gb on the ryzen and it's currently showing 81% free memory but that's with no Rosetta as no WUs have come down since early yesterday. I'll monitor going forward and report back. |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
As of this AM I have 16gb on the ryzen and it's currently showing 81% free memory but that's with no Rosetta as no WUs have come down since early yesterday. Yes, we are back to 8086 tasks ready to send according to the server status page which actually means 0 tasks ready to send. Maybe the admins should investigate, what those 8086 tasks are and if they eventually cause the issues. . |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2115 Credit: 41,109,659 RAC: 19,800 |
Sorry to be a bit late on this, but I did notice around 13th March I had a task consuming 2.4Gb and 14Gb of my 16Gb (total) RAM being in use to run 8 tasks. So you've got your extra RAM installed already? If it was a RAM issue (with 8Gb) you'll be fine now. I was only indicating there were some rogue tasks around last week that may have tripped you up back then. Hopefully new tasks play nicer as standard. Your original question was to ask if there was anything you could do - there probably wasn't at that time and you've more than covered yourself now under normal conditions. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2115 Credit: 41,109,659 RAC: 19,800 |
As of this AM I have 16gb on the ryzen and it's currently showing 81% free memory but that's with no Rosetta as no WUs have come down since early yesterday. Up to 10 minutes ago it was still showing those 8086 so that doesn't sound right. However, I'm here to say a whole load of tasks just came down and the server status page has just changed to show an additional 20k Rosetta tasks in progress and 15k still unsent. No idea how long that will last, but there is some progress. |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 22,989,157 RAC: 15,740 |
I was watching when a couple of the Rosetta WU failed. They computed properly down until the TIME REMAINING was zero seconds and the compute time was 8 hours and a few minutes. Instead of reporting the completion, the WU was marked as WAITING with zero seconds remaining. When the WU restarted, it indicated a COMPUTE ERROR with the "finish file present too long</message>". The 34 failing WU seemed to all fail at the end and were 4.08 Linux WU. https://boinc.bakerlab.org/rosetta/result.php?resultid=1063704662 |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2115 Credit: 41,109,659 RAC: 19,800 |
We had a good run, but no tasks left to download (and that mysterious 8086 ready to send again, whatever that is) |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 389 Credit: 12,070,320 RAC: 12,300 |
I was watching when a couple of the Rosetta WU failed. That sounds very similar to mine. I did notice that a few of mine showed n decoys and then appeared to restart and showed a session with 1 decoy before failing. |
bcavnaugh Send message Joined: 7 Dec 13 Posts: 7 Credit: 2,389,640 RAC: 0 |
Not getting any Tasks on this Host https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=3112116 A T630 Server but my other T630 is getting them fine https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=3282035 Both running Server 2012 R2 |
bcavnaugh Send message Joined: 7 Dec 13 Posts: 7 Credit: 2,389,640 RAC: 0 |
Not getting any Tasks on this Host https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=3112116 Looks OK now https://boinc.bakerlab.org/rosetta/results.php?hostid=3112116 |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 389 Credit: 12,070,320 RAC: 12,300 |
Not getting any Tasks on this Host https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=3112116 I suspect that was the last splutterings as the pool was draining, project status is showing 0 tasks unsent (but, as has been said, 8086 tasks ready to send). |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2115 Credit: 41,109,659 RAC: 19,800 |
Not getting any Tasks on this Host https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=3112116 Maybe they're tasks for pre-80386 machines? ... |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 389 Credit: 12,070,320 RAC: 12,300 |
Despite having a 6 hour limit set I am currently processing a batch of Rosetta 4.08 WUs that have been running for 8 hours and are showing an estimated 2 hours remaining. They all have names starting :- rb_03_21_2022_2162_ab_t000__robetta_cstwt_5.0_FT Is this normal or are they likely to error out? |
Admin Project administrator Send message Joined: 1 Jul 05 Posts: 4805 Credit: 0 RAC: 0 |
This seems odd but I would continue to let it run since it is a relatively large protein to model. |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
This seems odd but I would continue to let it run since it is a relatively large protein to model. Besides that, the limit is CPU-hours, so depending on what else the CPU has to do, the runtime can be a lot longer. . |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 389 Credit: 12,070,320 RAC: 12,300 |
This seems odd but I would continue to let it run since it is a relatively large protein to model. After 10 hours (elapsed and CPU) 2 of them (1064201222 and 1064201281) errored out with the same symptoms I’ve been seeing. Interestingly the 4 that succeeded (1064201216, 1064201223, 1064201224 and 1064201283) also had the default.out.gz exist, stream information inconsistent error so that is also a red herring. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org