Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 115 · 116 · 117 · 118 · 119 · 120 · 121 . . . 300 · Next
Author | Message |
---|---|
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,126,831 RAC: 4,742 |
I'm also having problems with Rosetta. Yes one--you only have 8gb of memory in that machine and should only try running ONE Rosetta task at a time and NOTHING else at all on the cpu. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,589,222 RAC: 22,497 |
Not true. As long as you have 1.3GB of RAM per core/thread you won't run in to lack of memory problems. 8GB of RAM on a 2 core system (even with onboard graphics) is plenty even with the largest RAM requirement Tasks.I'm also having problems with Rosetta. That system is showing signs of a major hardware or driver issue- hardware being most likely (unfortunately the error messages aren't of much help). I'd check the temperature of the CPU, make sure the power supply rails are all OK, etc. Give Memtest86 a run if the PSU & CPU temperatures are OK. If they check out OK, it could be the result of corruption of some of the Rosetta files- Resetting the project will dump all existing work, delete all the executable & database files & re-download them from scratch. Even if that does fix it, the question is still "Why did they become corrupt?" Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,589,222 RAC: 22,497 |
Not sure what's been going on, but apart from a one day glitch, may RAC has has been falling steadily for almost 2 weeks. And it's still falling. On my systems the amount of Credit per task has gone from around 350 to barely above 300 (there's the odd one giving 400, but more odd ones giving only 170 or so), yet the number of of Invalids and Errors has dropped away considerably. Grant Darwin NT |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,126,831 RAC: 4,742 |
Not sure what's been going on, but apart from a one day glitch, may RAC has has been falling steadily for almost 2 weeks. And it's still falling. Could be the new tasks are shorter and therefore give fewer credits |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,589,222 RAC: 22,497 |
Nope.Not sure what's been going on, but apart from a one day glitch, may RAC has has been falling steadily for almost 2 weeks. And it's still falling. The Credit is based on the amount of work they do, during the time they run. The work being done & Runtime is unchanged, and many of the Tasks over the last couple of weeks are of types that have been released before, and were paying more Credit. A good example are the pre_helical_bundles_ we've been getting for months now. They're the ones that were paying around 350, now it's barely 300. The odd one will give 400, but there are many more odd ones paying only 170 or so. Just another example of the weird Credit mechanism behaviour of BOINC. Grant Darwin NT |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,126,831 RAC: 4,742 |
Nope.Not sure what's been going on, but apart from a one day glitch, may RAC has has been falling steadily for almost 2 weeks. And it's still falling. Is this the so called "credit new"? |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,589,222 RAC: 22,497 |
Is this the so called "credit new"?That would probably be part of it. Rosetta has their own Credit mechanism, but there are times where it uses the Credit New mechanism as well. Grant Darwin NT |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,126,831 RAC: 4,742 |
Is this the so called "credit new"?That would probably be part of it. Rosetta has their own Credit mechanism, but there are times where it uses the Credit New mechanism as well. I seriously dislike so called 'credit new' as it dissuades Projects from attracting new people as needed. I much prefer a credit system based on the type of task we are crunching, one that encourages people to crunch task a if they want lots of credit as opposed to task b which has a longer time frame. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,589,222 RAC: 22,497 |
I seriously dislike so called 'credit new' as it dissuades Projects from attracting new people as needed. I much prefer a credit system based on the type of task we are crunching, one that encourages people to crunch task a if they want lots of credit as opposed to task b which has a longer time frame.? If the Credit system worked as intended, regardless of the Task and regardless of the project, and regardless of how long it takes to process a Task, a given machine would get the same amount of Credit per hour for processing work. That way the only reason for choosing a particular project over another would be the project itself. A more powerful system will get more Credit because it's doing more work. If a Task of a given type takes 5min or takes 5 weeks, the amount of Credit awarded per hour should be the same. Of course if there is an application that is more efficient then that will result in more credit due to more work being done within a given time frame- but the amount of work actually being done is the same, so the Credit for that Task should remain unchanged. You just get the benefit of doing more work per hour, so you get a higher RAC thanks to the more optimised application. Unfortunately by it's very design Credit New varies the amount of Credit a Task will get for all sorts of reasons. And then you get projects such as Collatz that completely ignore the definition of a Cobblestone and award ridiculously, excessively inflated amounts for a Task. As it is my RAC appears to have stopped falling (for now). It hasn't started to climb back to where it was, but at least it's no longer falling. Grant Darwin NT |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,126,831 RAC: 4,742 |
As it is my RAC appears to have stopped falling (for now). It hasn't started to climb back to where it was, but at least it's no longer falling. And THAT is a very good thing, next step climbing again!!! |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,589,222 RAC: 22,497 |
I spoke too soon.As it is my RAC appears to have stopped falling (for now). It hasn't started to climb back to where it was, but at least it's no longer falling. It's gone back to falling again. Grant Darwin NT |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,126,831 RAC: 4,742 |
As it is my RAC appears to have stopped falling (for now). It hasn't started to climb back to where it was, but at least it's no longer falling. UGH!!! |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,589,222 RAC: 22,497 |
And to add to the continually falling RAC, a new batch of work has a roughly 50% failure rate. 11mers_FF2__cyclo_11mer_ are culprits. Runs for 30sec or less and then dies- <core_client_version>7.16.11</core_client_version> <![CDATA[ <message> (unknown error) - exit code 3221225477 (0xc0000005)</message> <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @11mers_FF2__cyclo_11mer_LVStub2818_000054_extract_B.flags -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 2375971 Using database: database_357d5d93529_n_methylminirosetta_database Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x00007FF771B2D578 Engaging BOINC Windows Runtime Debugger... ******************** BOINC Windows Runtime Debugger Version 7.9.0 Dump Timestamp : 06/10/21 12:57:52 Install Directory : C:Program FilesBOINC Data Directory : C:ProgramDataBOINC Project Symstore : https://boinc.bakerlab.org/rosetta/symstore LoadLibraryA( C:ProgramDataBOINCdbghelp.dll ): GetLastError = 126 Loaded Library : dbghelp.dll LoadLibraryA( C:ProgramDataBOINCsymsrv.dll ): GetLastError = 126 LoadLibraryA( symsrv.dll ): GetLastError = 126 LoadLibraryA( C:ProgramDataBOINCsrcsrv.dll ): GetLastError = 126 LoadLibraryA( srcsrv.dll ): GetLastError = 126 LoadLibraryA( C:ProgramDataBOINCversion.dll ): GetLastError = 126 Loaded Library : version.dll Debugger Engine : 4.0.5.0 Symbol Search Path: C:ProgramDataBOINCslots4;C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosetta;srv*C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettasymbols*http://msdl.microsoft.com/download/symbols;srv*C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettasymbols*https://boinc.bakerlab.org/rosetta/symstore ModLoad: 000000006ded0000 00000000057ef000 C:ProgramDataBOINCprojectsboinc.bakerlab.org_rosettarosetta_4.20_windows_x86_64.exe (-exported- Symbols Loaded) Linked PDB Filename : C:cygwin64homeboinc4.17RosettamainsourceideVisualStudiox64BoincReleaserosetta_4.20_windows_x86_64.pdb ModLoad: 00000000d9bb0000 00000000001f5000 C:WINDOWSSYSTEM32ntdll.dll (6.2.19041.928) (-exported- Symbols Loaded) Linked PDB Filename : ntdll.pdb File Version : 10.0.19041.804 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.804 ModLoad: 00000000d8010000 00000000000bd000 C:WINDOWSSystem32KERNEL32.DLL (6.2.19041.928) (-exported- Symbols Loaded) Linked PDB Filename : kernel32.pdb File Version : 10.0.19041.804 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.804 ModLoad: 00000000d76f0000 00000000002c8000 C:WINDOWSSystem32KERNELBASE.dll (6.2.19041.906) (-exported- Symbols Loaded) Linked PDB Filename : kernelbase.pdb File Version : 10.0.19041.804 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.804 ModLoad: 00000000d81c0000 000000000006b000 C:WINDOWSSystem32WS2_32.dll (6.2.19041.546) (-exported- Symbols Loaded) Linked PDB Filename : ws2_32.pdb File Version : 10.0.19041.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.1 ModLoad: 00000000d83e0000 000000000012b000 C:WINDOWSSystem32RPCRT4.dll (6.2.19041.928) (-exported- Symbols Loaded) Linked PDB Filename : rpcrt4.pdb File Version : 10.0.19041.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.1 ModLoad: 00000000d94b0000 00000000001a0000 C:WINDOWSSystem32USER32.dll (6.2.19041.906) (-exported- Symbols Loaded) Linked PDB Filename : user32.pdb File Version : 10.0.19038.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19038.1 ModLoad: 00000000d7500000 0000000000022000 C:WINDOWSSystem32win32u.dll (6.2.19041.906) (-exported- Symbols Loaded) Linked PDB Filename : win32u.pdb File Version : 10.0.19041.906 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.906 ModLoad: 00000000d8640000 000000000002a000 C:WINDOWSSystem32GDI32.dll (6.2.19041.746) (-exported- Symbols Loaded) Linked PDB Filename : gdi32.pdb File Version : 10.0.19041.746 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.746 ModLoad: 00000000d72f0000 000000000010b000 C:WINDOWSSystem32gdi32full.dll (6.2.19041.928) (-exported- Symbols Loaded) Linked PDB Filename : gdi32full.pdb File Version : 10.0.19041.928 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.928 ModLoad: 00000000d7b70000 000000000009d000 C:WINDOWSSystem32msvcp_win.dll (6.2.19041.789) (-exported- Symbols Loaded) Linked PDB Filename : msvcp_win.pdb File Version : 10.0.19041.789 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.789 ModLoad: 00000000d7400000 0000000000100000 C:WINDOWSSystem32ucrtbase.dll (6.2.19041.789) (-exported- Symbols Loaded) Linked PDB Filename : ucrtbase.pdb File Version : 10.0.19041.789 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.789 ModLoad: 00000000d8bd0000 00000000000ac000 C:WINDOWSSystem32ADVAPI32.dll (6.2.19041.610) (-exported- Symbols Loaded) Linked PDB Filename : advapi32.pdb File Version : 10.0.19041.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.1 ModLoad: 00000000d8c80000 000000000009e000 C:WINDOWSSystem32msvcrt.dll (7.0.19041.546) (-exported- Symbols Loaded) Linked PDB Filename : msvcrt.pdb File Version : 7.0.19041.546 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 7.0.19041.546 ModLoad: 00000000d8290000 000000000009b000 C:WINDOWSSystem32sechost.dll (6.2.19041.906) (-exported- Symbols Loaded) Linked PDB Filename : sechost.pdb File Version : 10.0.19041.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.1 ModLoad: 00000000d8670000 0000000000030000 C:WINDOWSSystem32IMM32.DLL (6.2.19041.546) (-exported- Symbols Loaded) Linked PDB Filename : imm32.pdb File Version : 10.0.19041.546 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.546 ModLoad: 00000000d5250000 0000000000012000 C:WINDOWSSYSTEM32kernel.appcore.dll (6.2.19041.546) (-exported- Symbols Loaded) Linked PDB Filename : Kernel.Appcore.pdb File Version : 10.0.19041.546 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.546 ModLoad: 00000000d6030000 0000000000033000 C:WINDOWSSYSTEM32ntmarta.dll (6.2.19041.546) (-exported- Symbols Loaded) Linked PDB Filename : ntmarta.pdb File Version : 10.0.19041.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.1 ModLoad: 00000000d1530000 00000000001e4000 C:WINDOWSSYSTEM32dbghelp.dll (6.2.19041.867) (-exported- Symbols Loaded) Linked PDB Filename : dbghelp.pdb File Version : 10.0.19041.867 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.867 ModLoad: 00000000d2270000 000000000000a000 C:WINDOWSSYSTEM32version.dll (6.2.19041.546) (-exported- Symbols Loaded) Linked PDB Filename : version.pdb File Version : 10.0.19041.546 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.546 ModLoad: 00000000d7590000 0000000000080000 C:WINDOWSSystem32bcryptPrimitives.dll (6.2.19041.662) (-exported- Symbols Loaded) Linked PDB Filename : bcryptprimitives.pdb File Version : 10.0.19041.662 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.662 *** Dump of the Process Statistics: *** - I/O Operations Counters - Read: 5698, Write: 646, Other 13797 - I/O Transfers Counters - Read: 17336636, Write: 14175, Other 6664 - Paged Pool Usage - QuotaPagedPoolUsage: 317096, QuotaPeakPagedPoolUsage: 317376 QuotaNonPagedPoolUsage: 7200, QuotaPeakNonPagedPoolUsage: 7352 - Virtual Memory Usage - VirtualSize: 83505152, PeakVirtualSize: 895533056 - Pagefile Usage - PagefileUsage: 83505152, PeakPagefileUsage: 83505152 - Working Set Size - WorkingSetSize: 119312384, PeakWorkingSetSize: 119316480, PageFaultCount: 29541 *** Dump of thread ID 7328 (state: Initialized): *** - Information - Status: Base Priority: Normal, Priority: Normal, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 0.000000 - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x00007FF771B2D578 - Registers - rax=000000000000003a rbx=0000000002c36ae0 rcx=0000000003642ac0 rdx=0000000003722bf8 rsi=000000000000000b rdi=0000000003642ac0 r8=000000000000003a r9=0000000000000421 r10=0000000071a76e80 r11=0000000040745240 r12=000000006ded0000 r13=000000004075f960 r14=0000000040745980 r15=000000000048b215 rip=0000000071b2d578 rsp=00000000407452b8 rbp=0000000000000000 cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010206 - Callstack - ChildEBP RetAddr Args to Child 407452b0 6e3a831c 00000000 71a76d60 71a76e80 40745298 rosetta_4.20_windows_x86_64!xmlValidateNotationDecl+0x0 407452e0 6e36935d 02c36ae0 40745380 6e35b215 00000000 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 40745310 714d7f10 723c0150 4075f960 00000000 00000001 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 40745340 6e3539e8 7346a32c 6ded0000 40745430 d9be0e7b rosetta_4.20_windows_x86_64!xmlValidateNotationDecl+0x0 407453b0 d9c5207f 00000000 40745930 40745ff0 00000000 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 407453e0 d9c01454 00000000 40745930 40745ff0 00000000 ntdll!__chkstk+0x0 40745af0 d9c50bae 02c20000 40745bc9 71b4a450 d9bdb3c7 ntdll!RtlRaiseException+0x0 40746280 6e613e2b fffffffe 06eb4dc8 ffffffff 6e6218c5 ntdll!KiUserExceptionDispatcher+0x0 407462d0 6e623690 71b4a3a0 06eb4b20 71b4a3a0 407463c9 rosetta_4.20_windows_x86_64!cppdb::session::is_open+0x0 40746400 6e739ee8 06923ea8 07290720 06eb4b20 07290720 rosetta_4.20_windows_x86_64!cppdb::session::is_open+0x0 40746fb0 6e6d4b6c 07445680 d9bdb3c7 08b80000 00000000 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 407471b0 6e6d488e 40747298 00000000 40747480 00000000 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 40747310 6e633da1 40747488 00000000 02c34920 40747550 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 407476d0 6e639f08 40747a20 40747a20 40747a20 00000000 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 40747d20 6e6384db 03310330 40747d80 031d9b90 031d9b90 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 40747e80 6e5a1fb7 00000000 40747f90 031d9b90 40748190 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 40747ff0 6e5a57a6 00000005 6e345190 031f9b40 031f9b40 rosetta_4.20_windows_x86_64!cppdb::session::is_open+0x0 40748060 6e5a56cc 40748368 407481d9 40748368 031d9b90 rosetta_4.20_windows_x86_64!cppdb::session::is_open+0x0 40748110 6e66b6f5 40748368 40748641 00000000 6e3675e8 rosetta_4.20_windows_x86_64!cppdb::session::is_open+0x0 40748230 6e66a592 00000005 40748368 40748540 00000000 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 40748300 6e66ad06 00000000 00000000 40748c20 08b80000 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 407484a0 6eac71a3 40748540 40748c20 ffffff01 6e353e73 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 40748790 6eac9d09 00000000 00000001 407488a0 40748c20 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 40748b20 6eac2f8a 40748b60 40748c20 06730c40 0346fb90 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 40748b80 6ecdcc70 40748c20 40749348 031f9b40 00000000 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 40749310 6ecdc6e4 0716e7d0 07289100 73345cc0 6e3475a6 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 40749370 6ece603e 40749460 0716e500 40749480 40749bd0 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 40749af0 6ece56d4 2305e878 2305eb68 732b7f70 6ed06cb4 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 40749b80 6ece578e 00000005 4074a128 0346fb90 00000001 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 40749d20 6e35081d 034ff820 034ff820 0346fb90 02c37e01 rosetta_4.20_windows_x86_64!cppdb::backend::statement::cache+0x0 4075f950 6e35b215 00000000 00000000 7327ccf8 00000000 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 4075f990 d8027034 00000000 00000000 00000000 00000000 rosetta_4.20_windows_x86_64!xmlParserInputRead+0x0 4075f9c0 d9c02651 00000000 00000000 00000000 00000000 KERNEL32!BaseThreadInitThunk+0x0 4075fa40 00000000 00000000 00000000 00000000 00000000 ntdll!RtlUserThreadStart+0x0 *** Dump of thread ID 32761 (state: Initialized): *** - Information - Status: Base Priority: Normal, Priority: Unknown, , Kernel Time: 6.000000, User Time: 0.000000, Wait Time: 2428590080.000000 - Registers - rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000 rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000 r8=0000000000000000 r9=0000000000000000 r10=0000000000000000 r11=0000000000000000 r12=0000000000000000 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000 rip=0000000000000000 rsp=0000000000000000 rbp=0000000000000000 cs=0000 ss=0000 ds=0000 es=0000 fs=0000 gs=0000 efl=00000000 - Callstack - ChildEBP RetAddr Args to Child (-nosymbols- PC == 0) 00000000 00000000 00000000 00000000 00000000 00000000 !+0x0 *** Dump of thread ID 30891432 (state: Unknown): *** - Information - Status: Base Priority: Normal, Priority: Unknown, , Kernel Time: 17179869184.000000, User Time: 21474836480.000000, Wait Time: 0.000000 - Registers - rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000 rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000 r8=0000000000000000 r9=0000000000000000 r10=0000000000000000 r11=0000000000000000 r12=0000000000000000 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000 rip=0000000000000000 rsp=0000000000000000 rbp=0000000000000000 cs=0000 ss=0000 ds=0000 es=0000 fs=0000 gs=0000 efl=00000000 - Callstack - ChildEBP RetAddr Args to Child (-nosymbols- PC == 0) 00000000 00000000 00000000 00000000 00000000 00000000 !+0x0 *** Debug Message Dump **** *** Foreground Window Data *** Window Name : Window Class : Window Process ID: 0 Window Thread ID : 0 Exiting... </stderr_txt> ]]> Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,589,222 RAC: 22,497 |
Looks like we've got a server issue. Server Status page shows all green and running, but there is no work available to to go out. Plenty of queued up jobs, but no Tasks ready to send, All requests for work work result in none. As a result, Work in progress is taking a dive. Around 21:30 Monday server time was when things fell over. Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,137,636 RAC: 16,334 |
Looks like we've got a server issue. Correct... :( The work unit generator was being worked on yesterday to add support for VM applications and it was crashing. The good news is it was fixed a couple of hours ago - tasks coming down here |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,589,222 RAC: 22,497 |
It lives! Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,589,222 RAC: 22,497 |
Not sure what's going on, but the amount of work In progress ahs been falling away slowly but surely for 2 days now. Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,137,636 RAC: 16,334 |
Not sure what's going on, but the amount of work In progress has been falling away slowly but surely for 2 days now. Not sure either. The mix of tasks doesn't imply anything different recently. I put in that request to reduce the disk-demand for pre_helical_bundles tasks, but it hasn't happened. That might've improved things, but by not changing it wouldn't result in any change to In Progress tasks you're seeing. No idea |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_1mq3ho7q_1391020_4 https://boinc.bakerlab.org/rosetta/result.php?resultid=1400578710 I had to abort this task after running for nearly 2 days elapsed time. I stalled out around 80%. No movement in the graphics, no increase/decrease in the CPU usage rate, no counting up on completion, nothing. I shut my system down at night after running a huge electric bill last year running 24/7, but my shut down sequence is to suspend the tasks, shut down the client and then exit. The leave non GPU tasks in memory while suspended is checked. I have no idea what could cause it to stall. I had this problem with another pre helical task in the past. Any ideas how to prevent this or what is causing it? How long does this kind of task take to complete? I run 16-17 hours a day, change tasks is set at 6 hours. oh...and cpu run time was 7 hours in 1.5 days. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,589,222 RAC: 22,497 |
Any ideas how to prevent this or what is causing it?I suspect that things aren't actually stalling as such, it's just that your system so ridiculously overcommitted trying to do work that some Tasks take forever to complete so it looks like they have stalled. Even those that do complete take way more time than they should. eg This Task of yours. rb_06_04_79633_77510_ab_t000__h002_robetta_IGNORE_THE_REST_04_10_1395231_248_0 Run time 22 hours 21 min 14 sec CPU time 7 hours 59 min 47 sec And one of your Moo wrapper CPU Tasks is even worse dnetc_r72_1624557342_9_9_0 Run time 7 hours 2 min 3 sec CPU time 23 min 18 sec 22 hours to do 8 hours of work is bad enough, but 7 hours to do 23 minutes of work is beyond ridiculous. Compared to one of my Rosetta Tasks. pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_3rl4pd6v_1391037_4_1 Run time 7 hours 59 min 37 sec CPU time 7 hours 57 min 5 sec Check Task Manager or use Process Explorer to see what else is running on the system apart from BOINC projects. If there's nothing else and it's just BOINC projects, then you need to reserve a CPU core to support each GPU Task that is running. Running multiple GPU tasks at the same time as trying to process CPU work on CPU cores that are trying to support the GPU work is the most likely cause of all your issues. Reserve a CPU core for each GPU Task being run for each project doing GPU work, make sure that there are no other processes sucking up CPU time and your system output for all BOINC projects will improve hugely with all the CPU Tasks that no longer take 7 hours to do 23 minutes of work! Ah! I though this sounded familiar & now i remember we have had this conversation before, and back then you mentioned you run Folding at home as well. I told you then what needed to be done, it appears you haven't done it, hence you are still having the same issues. Grant Darwin NT |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org