Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 193 · 194 · 195 · 196 · 197 · 198 · 199 . . . 300 · Next
Author | Message |
---|---|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2118 Credit: 41,162,648 RAC: 14,990 |
This is a task that is still running but barely for some reason. (aagb-mPPS-mPHE-mACHC13T-ACHC12C) Greg, these are extracts from one of your reports. There's something you've said here I can confirm. When you (and I) pause or reboot or there's some reason to stop processing then restart, this is when I notice zombie tasks appearing straight away after. Going back to what you were saying earlier, it may well be that aagb tasks have a problem with re-starting most of all, but check them all anyway. And again, there's a line in there talking about "Setting CPU throttle for VM. (98%)" So what I'm going to suggest is to be very wary after pausing or rebooting. Re-check tasks to see if they've restarted ok. If they haven't restarted using CPU time, abort them. They'll have done all the work they're going to do. Maybe 2 out of 10 will have a problem in my experience. Once you do this, see if any other problems develop. In my very limited experience, they won't as you'll have addressed the issues as soon as they appear. Give it a few days and report how things are going. |
zxcvbob Send message Joined: 4 Jan 06 Posts: 8 Credit: 830,878 RAC: 0 |
Yes it was an aagb* task. I aborted it and uninstalled vbox. I'm running 4.2 tasks just fine, and am going to attach a couple more computers to R@H today (without vbox.) I also have an old Xeon-based server with Linux installed (I don't run it much because the cooling fan is so loud); I wonder if it will run those tasks natively? |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,717,270 RAC: 10,117 |
Yes it was an aagb* task. I aborted it and uninstalled vbox. I'm running 4.2 tasks just fine, and am going to attach a couple more computers to R@H today (without vbox.) I also have an old Xeon-based server with Linux installed (I don't run it much because the cooling fan is so loud); I wonder if it will run those tasks natively?Welcome to the club, I have two dual Xeon X5650 computers. Change the fan, you can get lovely quiet things on Ebay. I've done the reverse, I have mine cooled with some very loud 120V 6 inch fans I bought 30 years ago as a teenager from a bankruptcy recycling company. Not sure what they were from, but they have solid steel very sharp blades! Hurts if your finger gets in there, or on the 120V terminals! |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Sid, again aagb Started 1056 it is now 1855 Right off the bat the times are wrong: 2022-03-21 12:38:25 (7640): Status Report: Elapsed Time: '6000.608391' 2022-03-21 12:38:25 (7640): Status Report: CPU Time: '16.687500' 6000 for 16 seconds? WTF? And just under 2 hours in. And then this: 2022-03-21 14:20:03 (7640): Status Report: Elapsed Time: '12001.508751' 2022-03-21 14:20:03 (7640): Status Report: CPU Time: '28.937500' No pause still. 2022-03-21 17:57:22 (7640): Status Report: Elapsed Time: '24002.140787' 2022-03-21 17:57:22 (7640): Status Report: CPU Time: '49.781250' Never paused once. So its deeper than my machine and BOINC Killed it at 62.69% after over 7.5 hrs of working. What a joke! .19% of a core. Waste of time! |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,717,270 RAC: 10,117 |
Some kind of a response from them would be nice, perhaps one of: 1) We know there's a problem but our programmers are too inept to fix it. 2) We didn't know there was a problem because our heads are buried in the sand. 3) We know there's a problem and we're working on fixing it by [insert date] |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 2,123 |
An idea to check for the python tasks failing all at once: Something may be writing into a location in the *.vdi files that is supposed to be read-only, and is shared among all the python tasks running at once. To check for this: 1. While no python tasks are running, go to the shared directory for read-only Rosetta@Home files. On my computer, it is: C:/ProgramData/BOINC/projects/boinc.bakerlab.org_rosetta If your computer does not run under Windows 10, expect a different directory name. For each file with a name ending with .vdi , make a copy elsewhere. 2. Allow many python tasks to start. Watch for all of them to fail at once. If this happen, make a second copy of each of the files with names ending with vdi . Start a program that will check two binary files, and tell you if they are identical or not. Tell it to compare the old and new copies of each of the vdi files. Let us know if the copies were identical or not. Unless you have special information on the file structures, don't bother with what the differences are. Another idea: vbox64 (which is used by python tasks) may have problems restarting from checkpoints, but not always. I have not thought of a way to check for that. |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 258 Credit: 483,503 RAC: 109 |
Virtualbox tasks do not write to boinc.bakerlab.org_rosetta vdi They fully copy vdi files to slot directories. All 7 gigabytes. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Some kind of a response from them would be nice, perhaps one of: HAHAHAHA...yeah right. Take #1 and add we don't care |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Idiot server kicked me off after 2 aborts and 1 error all from aagb If they would make things correct the first time I wouldn't have this problem. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2118 Credit: 41,162,648 RAC: 14,990 |
Never paused once That's fine. If they pause, abort them. They never unpause in my experience. But I have plenty of running and successfully completed aagb tasks too. They may certainly be most susceptible, but it's not all of them. Just check them 10mins after they've started and you'll know which way it's heading, then take the appropriate action. |
Bruce Morse Send message Joined: 8 Oct 05 Posts: 5 Credit: 816,727 RAC: 0 |
I have a two applications of Rosetta python projects 1.03 (vbox64) running. aagb-SAR_pp-….. And aaam-PRO_pp-…. They are currently showing elapsed time; Time remaining: 5d 14:45:56 00:00:04 5d 14:39:50. 00:00:04 The elapsed timer is running. The time remaining has been getting progressively longer and longer between changes - currently measured in hours. Any ideas? |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,717,270 RAC: 10,117 |
I have a two applications ofYou need to see how much CPU time they're actually using. These tasks tend to sit doing nothing. If you have Boinctasks, this shows real CPU usage. Or you can use Windows task manager. If they aren't doing anything, abort them. |
Bruce Morse Send message Joined: 8 Oct 05 Posts: 5 Credit: 816,727 RAC: 0 |
Additional notes: Menu options in BOINC Manager are no longer functioning, including the snooze, about and exit options from the taskbar; Outlook will no longer start, Just noticed that my elapsed time NOW reads three (3) seconds remaining. The version of Vbox is the one distributed with BOINC and has not been updated. Is/Are there some settings in Vbox that *I* should have modified? Vbox shows both tasks running and a pop up indicates a new version available: 6.1.32 Current version: 6.0.14r133895 (Qt5.6.2) There is sporadic activity. |
Bruce Morse Send message Joined: 8 Oct 05 Posts: 5 Credit: 816,727 RAC: 0 |
Checking windows 10 task manager: baseline - There is very little cpu usage but there is some bursts of usage; memory - minimal changes; disk - some; Vbox Ethernet- zero; and LAN network - some. |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 258 Credit: 483,503 RAC: 109 |
can you open virtualbox gui, press show and look at what virtualbox vm screens are showing? Also open task C:programdataBOINCslots[slotnumber]shared and look at file modification times? |
Bruce Morse Send message Joined: 8 Oct 05 Posts: 5 Credit: 816,727 RAC: 0 |
can you open virtualbox gui, press show and look at what virtualbox vm screens are showing? Looks like it never started? Last line: Intel MKL FATAL ERROR: Error on loading function mkl_lapack_ps_mc3_dsytrf_l_small. Also open task Most recent are 03/16/2022 05:46 PM (Um.. today: 03/22/3022) Kinda saddens me - it appears I have wasted many days. ETA: left it running for now in case anyone wants additional information. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,717,270 RAC: 10,117 |
I get that every single time on 5 of my 7 computers. Nobody knows why.can you open virtualbox gui, press show and look at what virtualbox vm screens are showing?Looks like it never started? Which of your computers are having problems? From my end it looks like older computers don't work. Mine are: Ryzen 9 3900XT - works all the time on Rosetta Python VB. i5 8600K - works all the time on Rosetta Python VB. Core 2 Quad Q8400 - gets the same error as you every time. Pentium N3700 - gets the same error as you every time. Dual Xeon X5650 - gets the same error as you every time. Dual Xeon X5650 - gets the same error as you every time. i3 M350 - gets the same error as you every time. |
Bruce Morse Send message Joined: 8 Oct 05 Posts: 5 Credit: 816,727 RAC: 0 |
I currently have only two computers actively running Vbox: Toshiba laptop: Intel Pentium CPU 2020 (two core hyper thread)@ 2.4GHz; 16.0 GB RAM; Win10/home. & doesn’t want to play nice 6-core 3.2 GHz Intel Core i7-8700; 16.0 GB RAM; Win10/home. IS playing nice. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,717,270 RAC: 10,117 |
I currently have only two computers actively running Vbox:You seem to be getting the same as me. Newer machines work, older machines don't. I'm going to guess the Python app is using newer instruction sets only available on newer processors, and the incompetant fools at Rosetta are handing them out to everybody instead of only those that can handle it. They must be relying on you failing a lot of them so it automatically switches off your computer from Python, but the trouble is they don't just quickly fail, they sit doing nothing for days. And you have to fail 100 of them (not just abort them) before it bans you from Python. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,524,889 RAC: 6,628 |
Newer machines work, older machines don't. I'm going to guess the Python app is using newer instruction sets only available on newer processors, and the incompetant fools at Rosetta are handing them out to everybody instead of only those that can handle it. If i'm not wrong, VirtualBox exposes instructions sets automaticaly to guest machines so you're idea is not so fool. Python app is running TrRosetta simulations that are, probably, compiled against Tensorflow. Someone has this problem with old cpu |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org