Message boards : Number crunching : Minirosetta 3.73-3.78
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 14 · Next
Author | Message |
---|---|
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 22,989,157 RAC: 15,740 |
I'm seeing a lot of robetta tasks crash immediately : the relevant line in the task log seems to be: I think the error happens earlier than where you have pointed. The actual error is happening right at startup where Boinc is initializing the "slot" directory. The source code seems to point to a missing *.xml file that it is expecting which has the CDATA string and Rosetta then prints the "message". My guess would be .... out of disk space or a malformed rb* job. You might do an "ldd" on the rosetta graphics binary "minirosetta_graphics_3.73_x86_64-pc-linux-gnu" and make sure that it finds all the dynamic libraries. A "not found" would be a problem. sh-4.3$ ldd minirosetta_graphics_3.73_x86_64-pc-linux-gnu linux-vdso.so.1 (0x00007ffd343c0000) libGLU.so.1 => /lib64/libGLU.so.1 (0x00007fdcfa617000) libGL.so.1 => /lib64/libGL.so.1 (0x00007fdcfa37f000) YOUR JOB.... <core_client_version>7.2.42</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) <<<<<<<<<<<<<<<<<<<<<< THE ACTUAL ERROR </message> <stderr_txt> [2016- 4-27 8: 5:32:] :: BOINC:: Initializing ... ok. [2016- 4-27 8: 5:32:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. command: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_3.73_x86_64-pc-linux-gnu @flags_rb_04_26_64750_109092__t000__ab_robetta -in:file:boinc_wu_zip input_rb_04_26_64750_109092__t000__ab_robetta.zip -in:file:fasta t000_.fasta -frag3 t000_.200.3mers.index.gz -fragB t000_.200.3mers.index.gz -fragA t000_.200.9mers.index.gz -nstruct 10000 -cpu_run_time 10800 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 1109089 Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok ERROR: Unable to open weights/patch file. None of (./)beta_cart or (./)beta_cart.wts or minirosetta_database/scoring/weights/beta_cart or minirosetta_database/scoring/weights/beta_cart.wts exist ERROR:: Exit from: src/core/scoring/ScoreFunction.cc line: 2884 [0x4485e82] [0x3457456] [0x34579d8] [0x346a47c] An rb* from one of my systems. Task ID 814246795 Name rb_04_23_64912_108999__t000__ab_robetta_IGNORE_THE_REST_347209_9056_0 <core_client_version>7.2.42</core_client_version> <![CDATA[ <stderr_txt> [2016- 4-24 13:38:42:] :: BOINC:: Initializing ... ok. [2016- 4-24 13:38:42:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. command: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_3.73_x86_64-pc-linux-gnu @rb_04_22_64328_108946_ab_stage0_t000___robetta_FLAGS -psipred_ss2 t000_.psipred_ss2 -in::file::fasta t000_.fasta -kill_hairpins t000_.nobuformat.psipred_ss2 -in:file:boinc_wu_zip rb_04_22_64328_108946_ab_stage0_t000___robetta.zip -frag3 rb_04_22_64328_108946_ab_stage0_t000___robetta_t000_.200.3mers.index.gz -fragA rb_04_22_64328_108946_ab_stage0_t000___robetta_t000_.200.5mers.index.gz -fragB rb_04_22_64328_108946_ab_stage0_t000___robetta_t000_.200.3mers.index.gz -nstruct 10000 -cpu_run_time 10800 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 2103181 Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_d0bf94b.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/rb_04_22_64328_108946_ab_stage0_t000___robetta.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... Setting up folding (abrelax) ... Beginning folding (abrelax) ... BOINC:: Worker startup. Starting watchdog... Watchdog active. Starting work on structure: _00001 # cpu_run_time_pref: 86400 Starting work on structure: _00002 Starting work on structure: _00003 Starting work on structure: _00004 |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
Thanks for the suggestions. I don't think it's disk space: I'm using 1.5GB out of 100GB available to Boinc. In fact I don't think it's anything local to my machine at all since those tasks were given out again to a couple of wingmen: they failed in what seems to be the same fashion. (I tried the ldd command you suggested: output below) svincent@svincent-desktop:~/BOINC/projects/boinc.bakerlab.org_rosetta$ ldd minirosetta_graphics_3.73_x86_64-pc-linux-gnu linux-vdso.so.1 => (0x00007ffc460a5000) libGLU.so.1 => /usr/lib/x86_64-linux-gnu/libGLU.so.1 (0x00007f102c488000) libGL.so.1 => /usr/lib/x86_64-linux-gnu/mesa/libGL.so.1 (0x00007f102c222000) libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f102bf1e000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f102bc18000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f102ba02000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f102b63d000) libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f102b308000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f102b0ea000) libglapi.so.0 => /usr/lib/x86_64-linux-gnu/libglapi.so.0 (0x00007f102aec3000) libXext.so.6 => /usr/lib/x86_64-linux-gnu/libXext.so.6 (0x00007f102acb1000) libXdamage.so.1 => /usr/lib/x86_64-linux-gnu/libXdamage.so.1 (0x00007f102aaae000) libXfixes.so.3 => /usr/lib/x86_64-linux-gnu/libXfixes.so.3 (0x00007f102a8a8000) libX11-xcb.so.1 => /usr/lib/x86_64-linux-gnu/libX11-xcb.so.1 (0x00007f102a6a6000) libxcb-glx.so.0 => /usr/lib/x86_64-linux-gnu/libxcb-glx.so.0 (0x00007f102a48f000) libxcb-dri2.so.0 => /usr/lib/x86_64-linux-gnu/libxcb-dri2.so.0 (0x00007f102a28a000) libxcb-dri3.so.0 => /usr/lib/x86_64-linux-gnu/libxcb-dri3.so.0 (0x00007f102a087000) libxcb-present.so.0 => /usr/lib/x86_64-linux-gnu/libxcb-present.so.0 (0x00007f1029e84000) libxcb-sync.so.1 => /usr/lib/x86_64-linux-gnu/libxcb-sync.so.1 (0x00007f1029c7e000) libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f1029a5f000) libxshmfence.so.1 => /usr/lib/x86_64-linux-gnu/libxshmfence.so.1 (0x00007f102985d000) libXxf86vm.so.1 => /usr/lib/x86_64-linux-gnu/libXxf86vm.so.1 (0x00007f1029657000) libdrm.so.2 => /usr/lib/x86_64-linux-gnu/libdrm.so.2 (0x00007f102944a000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f1029246000) /lib64/ld-linux-x86-64.so.2 (0x00007f102c6f6000) libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007f1029042000) libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f1028e3c000) |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 22,989,157 RAC: 15,740 |
Thanks for the suggestions. Looks fine. I think you are probably correct. |
sow-8 Send message Joined: 23 Dec 14 Posts: 2 Credit: 591,945 RAC: 0 |
I'm seeing the same issue that's been reported above: an intermittent failure to get new workunits accompanied by this message in the event log. I just discovered the same thing in a machine I just assembled. I have a hunch while I am waiting for the Volunteer at BOINC to get back to me: My box has only one 'new' component: the SSD. I purchased an economy SSD at Amazon which works perfectly otherwise. This is just a hunch. Could the memory chips in the SSD somehow be marked as for use in a cellphone? Forgive me if that is stupid. All my other components have crunched Rosetta before (all the other parts I used in this build.) I also have NO gripes pending on this SSD. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 3,846 |
I'm seeing the same issue that's been reported above: an intermittent failure to get new workunits accompanied by this message in the event log. What operating system did you see this under? Windows, Linux, Android, or something else? I've had many Rosetta Mini workunits complete on one of my computers, which came with an SSD as its main drive. It uses Windows 10. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2115 Credit: 41,115,753 RAC: 19,563 |
I'm seeing the same issue that's been reported above: an intermittent failure to get new workunits accompanied by this message in the event log. It's happening everywhere, Win7 and no SSD here. Manually updating often fixes it, but in the meantime Boinc puts in a 24hr delay before retrying, which is very annoying if you don't notice. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 3,846 |
I'm seeing the same issue that's been reported above: an intermittent failure to get new workunits accompanied by this message in the event log. Something you might want to try: Check if restarting Win7, with no updates, fixes the problem, with a possible exception for the 24hr delay. Also, you might want to check for a problem I've seen in my last three completed workunits: For two of them, the application apparently completed with no problem seen. But then, the upload process for the output files gave a compute error of the upload error type. For the third one, a wingmate had this upload error, but then my computer got the workunit, completed it properly, then uploaded the output properly, then had one more validated workunit. The computer appears to about 5 hours into a 24hr delay; the other 19 hours should be more than enough to finish the other three Rosetta@home task it currently has. |
sinspin Send message Joined: 30 Jan 06 Posts: 29 Credit: 6,574,585 RAC: 0 |
I have the same problem here on my Win7 Ultimate, no updates / changes from my side. Everything works fine up to now. 30.04.2016 11:19:55 | rosetta@home | Sending scheduler request: To fetch work. 30.04.2016 11:19:55 | rosetta@home | Requesting new tasks for CPU and NVIDIA GPU and Intel GPU 30.04.2016 11:19:58 | rosetta@home | Scheduler request completed: got 0 new tasks 30.04.2016 11:19:58 | rosetta@home | No work sent 30.04.2016 11:19:58 | rosetta@home | Rosetta Mini for Android is not available for your type of computer. Wtf, since when have i a Android? What comes next? A Alien? A Zombie? |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 3,846 |
I have the same problem here on my Win7 Ultimate, no updates / changes from my side. Everything works fine up to now. Looks like someone with access to the source code for the Windows application should add a check at the end for whether the operating system setting still indicates Windows. |
sinspin Send message Joined: 30 Jan 06 Posts: 29 Credit: 6,574,585 RAC: 0 |
There are no changes at my side. No new Boinc version, etc. I think they have made something wrong at the past server downtime. Maybe some checking which kind of tasks can be send to a certain client. My computers list at my profile shows the right information about my machines. |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
And here it happened few times already on WinXP (and HDD). It must be a server side thing, since it's the server that's considering the possibility of sending Android stuff to a Windows PC. And no, it can't have anything to do with the chips on SSD. I strongly doubt even Windows would know, if they were originaly produced for smartphones (I doubt that, since completely other requirements), the OS just sees an SSD or HDD, it knows pretty much nothing about what's inside the drive since it does not need to know it. And even if we consider this highly unlikely possibility, that you have a SSD with smartphone chips AND Windows knows it, this information is for sure not passed to the servers (sched_request xml human readable, you can see there what information about your system is passed to the server). All the server gets is <platform_name>windows_intelx86</platform_name> and based on that it should send the application. . |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2115 Credit: 41,115,753 RAC: 19,563 |
I'm seeing the same issue that's been reported above: an intermittent failure to get new workunits accompanied by this message in the event log. If I'm on-site I'll usually notice at the time. Trouble is, I'm away for half of every week, so there's no reliable way to give it a kick or know if my whole system goes down |
LarryMajor Send message Joined: 1 Apr 16 Posts: 22 Credit: 31,533,212 RAC: 0 |
Getting this on both of my Linux 3.16.0-4 machines: Sun 01 May 2016 02:21:40 AM EDT | rosetta@home | Sending scheduler request: To report completed tasks. Sun 01 May 2016 02:21:40 AM EDT | rosetta@home | Reporting 12 completed tasks Sun 01 May 2016 02:21:40 AM EDT | rosetta@home | Requesting new tasks for CPU Sun 01 May 2016 02:21:47 AM EDT | rosetta@home | Scheduler request completed: got 0 new tasks Sun 01 May 2016 02:21:47 AM EDT | rosetta@home | No work sent Sun 01 May 2016 02:21:47 AM EDT | rosetta@home | Rosetta Mini for Android is not available for your type of computer. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
If I'm on-site I'll usually notice at the time. Trouble is, I'm away for half of every week, so there's no reliable way to give it a kick or know if my whole system goes down I have been gone since 20 April, and my first (of two) Haswell machines running Rosetta, both on Win7 64-bit, went down that very day, including the GPU running Einstein. The other machine went down on 29 April, which was running POEM on the GPUs. I like Rosetta for its science, but they have a lot of experimenters doing various types of work, and it is therefore not the ultimate in reliability, if I may say so. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1993 Credit: 9,520,400 RAC: 11,365 |
As i said in another threads, if admins don't update the server... |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2115 Credit: 41,115,753 RAC: 19,563 |
If I'm on-site I'll usually notice at the time. Trouble is, I'm away for half of every week, so there's no reliable way to give it a kick or know if my whole system goes down If only I was concerned about Rosetta's stability. After claiming 6 months ago I was going to stop fiddling with my PC's overclock, I've been at it again, adding a further 165MHz. I /think/ I'm stable enough to keep crunching throughout my half-week absences, but never quite know for sure until I get back home. My previous efforts Now at 18.0 Multiplier x 243.8MHz FSB = 4379.6MHz compared to 4214.6 back then |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
If only I was concerned about Rosetta's stability. After claiming 6 months ago I was going to stop fiddling with my PC's overclock, I've been at it again, adding a further 165MHz. I /think/ I'm stable enough to keep crunching throughout my half-week absences, but never quite know for sure until I get back home. I don't overclock either the CPUs or GPUs, and I am speculating a bit that the problem is Rosetta. But my other Haswell machine, and three Ivy Bridge machines, have no problems. They are not running Rosetta either, only Einstein, WCG, CPDN and Folding. Normally all my machines can run for months without problems. I have noticed anomalies with Rosetta before, but never quite had a smoking gun, but I think this is pretty much it. |
googloo Send message Joined: 15 Sep 06 Posts: 133 Credit: 22,677,186 RAC: 6,531 |
Happening again on my Windows 7 Professional computer. The biggest problem is that BOINC manager puts a 24-hour "Communication deferred" on my computer, and I run out of Rosetta tasks if I don't manually update. Please do something about this. Nothing has changed on my computer. 5/3/2016 6:15:47 AM | rosetta@home | Sending scheduler request: To fetch work. 5/3/2016 6:15:47 AM | rosetta@home | Requesting new tasks for CPU and Intel GPU 5/3/2016 6:15:49 AM | rosetta@home | Scheduler request completed: got 0 new tasks 5/3/2016 6:15:49 AM | rosetta@home | No work sent 5/3/2016 6:15:49 AM | rosetta@home | Rosetta Mini for Android is not available for your type of computer. |
AMDave Send message Joined: 16 Dec 05 Posts: 35 Credit: 12,576,896 RAC: 0 |
Happening again on my Windows 7 Professional computer. The biggest problem is that BOINC manager puts a 24-hour "Communication deferred" on my computer, and I run out of Rosetta tasks if I don't manually update. Please do something about this. Nothing has changed on my computer. Is there an ETA for the resolution of this issue? |
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
|
Message boards :
Number crunching :
Minirosetta 3.73-3.78
©2024 University of Washington
https://www.bakerlab.org