|
|
 | | From: | Sander Steffann | | Subject: | Less available diskspace after crashed CLUSTER | | Date: | Sat, 15 Jan 2005 16:13:05 +0100 |
|
|
 | Hi,
I was running a CLUSTER on a big database (approx. 8G) and I didn't anticipate the diskspace usage, so I ran out of diskspace. This is on PostgreSQL 7.4.6. The backend crashed with the following log messages:
2005-01-15 15:17:46 [30605] PANIC: PANIC: could not write to file "/var/lib/pgsql/data/pg_xlog/xlogtemp.30605": Geen ruimte over op apparaat STATEMENT: CLUSTER; 2005-01-15 15:17:46 [11245] LOG: server process (PID 30605) was terminated by signal 6 2005-01-15 15:17:46 [11245] LOG: terminating any other active server processes 2005-01-15 15:17:46 [3378] WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. <... a lot more of the same messages from other backends ...> 2005-01-15 15:17:46 [11245] LOG: all server processes terminated; reinitializing 2005-01-15 15:17:46 [3389] LOG: database system was interrupted at 2005-01-15 15:17:44 CET 2005-01-15 15:17:46 [3389] LOG: checkpoint record is at 12/9E2976B0 2005-01-15 15:17:46 [3389] LOG: redo record is at 12/9E00A790; undo record is at 0/0; shutdown FALSE 2005-01-15 15:17:46 [3389] LOG: next transaction ID: 13482165; next OID: 1694415 2005-01-15 15:17:46 [3389] LOG: database system was not properly shut down; automatic recovery in progress 2005-01-15 15:17:46 [3389] LOG: redo starts at 12/9E00A790 2005-01-15 15:18:03 [3389] LOG: could not open file "/var/lib/pgsql/data/pg_xlog/00000012000000BF" (log file 18, segment 191): Onbekend bestand of map 2005-01-15 15:18:03 [3389] LOG: redo done at 12/BEFFD810 2005-01-15 15:18:13 [3389] LOG: recycled transaction log file "000000120000009B" 2005-01-15 15:18:13 [3389] LOG: recycled transaction log file "000000120000009C" 2005-01-15 15:18:13 [3389] LOG: recycled transaction log file "000000120000009D" 2005-01-15 15:18:13 [3389] LOG: database system is ready
Translations: Geen ruimte over op apparaat -> No space left on device Onbekend bestand of map -> Unknown file or directory
The database is running nice again, but I have the feeling I lost some diskspace... I suspected a temp file, so I stopped the postmaster, looked in /var/lib/pgsql/data/base/17142/pgsql_tmp, found one file and deleted it. But now I still have 1G less available diskspace than before the CLUSTER.
Where should I look next? I would like the diskspace back :-)
Thanks, Sander.
---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend
|
|
 | | From: | Tom Lane | | Subject: | Re: Less available diskspace after crashed CLUSTER | | Date: | Sat, 15 Jan 2005 13:01:17 -0500 |
|
|
 | "Sander Steffann" writes: > 2005-01-15 15:17:46 [30605] PANIC: PANIC: could not write to file > "/var/lib/pgsql/data/pg_xlog/xlogtemp.30605": Geen ruimte over op apparaat > STATEMENT: CLUSTER;
Hmm. I wonder why we have XLogFileInit forcing a PANIC for this. At one time it was called only from critical sections and so the error would become a panic anyway, but that's not true anymore...
> The database is running nice again, but I have the feeling I lost some > diskspace... I suspected a temp file, so I stopped the postmaster, looked in > /var/lib/pgsql/data/base/17142/pgsql_tmp, found one file and deleted it. But > now I still have 1G less available diskspace than before the CLUSTER.
> Where should I look next? I would like the diskspace back :-)
I'm guessing that the new table/index files created during the CLUSTER didn't get released after the PANIC. Look for files that are not referenced by any relfilenode value in pg_class. Don't forget that there may be multiple sections (filenode.1, etc). See http://developer.postgresql.org/docs/postgres/storage.html (which is for 8.0, but everything except the material on tablespaces applies to 7.4).
Also, did you get rid of the xlog temp file mentioned in the message?
regards, tom lane
---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend
|
|
 | | From: | Sander Steffann | | Subject: | Re: Less available diskspace after crashed CLUSTER | | Date: | Sat, 15 Jan 2005 23:21:20 +0100 |
|
|
 | Hi Tom,
>> 2005-01-15 15:17:46 [30605] PANIC: PANIC: could not write to file >> "/var/lib/pgsql/data/pg_xlog/xlogtemp.30605": Geen ruimte over op >> apparaat >> STATEMENT: CLUSTER; > > Hmm. I wonder why we have XLogFileInit forcing a PANIC for this. At > one time it was called only from critical sections and so the error > would become a panic anyway, but that's not true anymore...
I'll leave that part to you :-) I don't know much about PostgreSQL internals...
>> The database is running nice again, but I have the feeling I lost some >> diskspace... I suspected a temp file, so I stopped the postmaster, looked >> in >> /var/lib/pgsql/data/base/17142/pgsql_tmp, found one file and deleted it. >> But >> now I still have 1G less available diskspace than before the CLUSTER. > >> Where should I look next? I would like the diskspace back :-) > > I'm guessing that the new table/index files created during the CLUSTER > didn't get released after the PANIC. Look for files that are not > referenced by any relfilenode value in pg_class. Don't forget that > there may be multiple sections (filenode.1, etc). See > http://developer.postgresql.org/docs/postgres/storage.html > (which is for 8.0, but everything except the material on tablespaces > applies to 7.4).
Using the oid2name tool I get this: # for x in *; do echo -n "$x: "; oid2name -q -d abacus -o $x; done 1247: 1247 = pg_type 1249: 1249 = pg_attribute 1255: 1255 = pg_proc 1259: 1259 = pg_class 16384: 16384 = pg_attrdef 16386: 16386 = pg_constraint 16388: 16388 = pg_inherits 16390: 16390 = pg_index 16392: 16392 = pg_operator 16394: 16394 = pg_opclass 16396: 16396 = pg_am 16398: 16398 = pg_amop 16400: 16400 = pg_amproc 16402: 16402 = pg_language 16404: 16404 = pg_largeobject 16406: 16406 = pg_aggregate 16408: 16408 = pg_statistic 16410: 16410 = pg_rewrite 16412: 16412 = pg_trigger 16414: 16414 = pg_listener 16416: 16416 = pg_description 16418: 16418 = pg_cast 16595: 16595 = pg_namespace 16597: 16597 = pg_conversion 16599: 16599 = pg_depend 16601: 16601 = pg_aggregate_fnoid_index 16602: 16602 = pg_am_name_index 16603: 16603 = pg_am_oid_index 16604: 16604 = pg_amop_opr_opc_index 16605: 16605 = pg_amop_opc_strategy_index 16606: 16606 = pg_amproc_opc_procnum_index 16607: 16607 = pg_attrdef_adrelid_adnum_index 16608: 16608 = pg_attrdef_oid_index 16609: 16609 = pg_attribute_relid_attnam_index 16610: 16610 = pg_attribute_relid_attnum_index 16611: 16611 = pg_cast_oid_index 16612: 16612 = pg_cast_source_target_index 16613: 16613 = pg_class_oid_index 16614: 16614 = pg_class_relname_nsp_index 16615: 16615 = pg_constraint_conname_nsp_index 16616: 16616 = pg_constraint_conrelid_index 16617: 16617 = pg_constraint_contypid_index 16618: 16618 = pg_constraint_oid_index 16619: 16619 = pg_conversion_default_index 16620: 16620 = pg_conversion_name_nsp_index 16621: 16621 = pg_conversion_oid_index 16624: 16624 = pg_depend_depender_index 16625: 16625 = pg_depend_reference_index 16626: 16626 = pg_description_o_c_o_index 16629: 16629 = pg_index_indrelid_index 16630: 16630 = pg_index_indexrelid_index 16631: 16631 = pg_inherits_relid_seqno_index 16632: 16632 = pg_language_name_index 16633: 16633 = pg_language_oid_index 16634: 16634 = pg_largeobject_loid_pn_index 16635: 16635 = pg_namespace_nspname_index 16636: 16636 = pg_namespace_oid_index 16637: 16637 = pg_opclass_am_name_nsp_index 16638: 16638 = pg_opclass_oid_index 16639: 16639 = pg_operator_oid_index 16640: 16640 = pg_operator_oprname_l_r_n_index 16641: 16641 = pg_proc_oid_index 16642: 16642 = pg_proc_proname_args_nsp_index 16643: 16643 = pg_rewrite_oid_index 16644: 16644 = pg_rewrite_rel_rulename_index 16647: 16647 = pg_statistic_relid_att_index 16648: 16648 = pg_trigger_tgconstrname_index 16649: 16649 = pg_trigger_tgconstrrelid_index 16650: 16650 = pg_trigger_tgrelid_tgname_index 16651: 16651 = pg_trigger_oid_index 16652: 16652 = pg_type_oid_index 16653: 16653 = pg_type_typname_nsp_index 16656: 16656 = pg_toast_16384 16658: 16658 = pg_toast_16384_index 16659: 16659 = pg_toast_16386 16661: 16661 = pg_toast_16386_index 16665: 16665 = pg_toast_16416 16667: 16667 = pg_toast_16416_index 16671: 16671 = pg_toast_1255 16673: 16673 = pg_toast_1255_index 16674: 16674 = pg_toast_16410 16676: 16676 = pg_toast_16410_index 16680: 16680 = pg_toast_16408 16682: 16682 = pg_toast_16408_index 1688060: No tables with that oid found 1688062: 1688062 = pg_toast_1688060 1688064: 1688064 = pg_toast_1688060_index 1688065: No tables with that oid found 1688066: No tables with that oid found 1688068: 1688068 = pg_toast_1688066 1688070: 1688070 = pg_toast_1688066_index 1688071: No tables with that oid found 1688072: No tables with that oid found 1688074: 1688074 = pg_toast_1688072 1688076: 1688076 = pg_toast_1688072_index 1688078: No tables with that oid found 1688079: No tables with that oid found 1688085: No tables with that oid found 1688086: No tables with that oid found 1688088: No tables with that oid found 1688089: No tables with that oid found 1688091: 1688091 = pg_toast_1688089 1688093: 1688093 = pg_toast_1688089_index 1688094: No tables with that oid found 1688095: No tables with that oid found 1688097: No tables with that oid found 1688098: No tables with that oid found 1688101: No tables with that oid found 1688102: No tables with that oid found 1688104: 1688104 = pg_toast_1688102 1688106: 1688106 = pg_toast_1688102_index 1688107: No tables with that oid found 1688111: No tables with that oid found 1688114: No tables with that oid found 1688116: 1688116 = pg_toast_1688114 1688118: 1688118 = pg_toast_1688114_index 1688119: No tables with that oid found 1688120: No tables with that oid found 1688122: 1688122 = pg_toast_1688120 1688124: 1688124 = pg_toast_1688120_index 1688125: No tables with that oid found 1688133: No tables with that oid found 1688135: 1688135 = pg_toast_1688133 1688137: 1688137 = pg_toast_1688133_index 1688138: No tables with that oid found 1688140: No tables with that oid found 1688142: 1688142 = pg_toast_1688140 1688144: 1688144 = pg_toast_1688140_index 1688145: No tables with that oid found 1688146: No tables with that oid found 1688147: No tables with that oid found 1688147.1: No tables with that oid found 1688150: No tables with that oid found 1688152: No tables with that oid found 1688155: No tables with that oid found 17070: 17070 = sql_features 17072: 17072 = pg_toast_17070 17074: 17074 = pg_toast_17070_index 17075: 17075 = sql_implementation_info 17077: 17077 = pg_toast_17075 17079: 17079 = pg_toast_17075_index 17080: 17080 = sql_languages 17082: 17082 = pg_toast_17080 17084: 17084 = pg_toast_17080_index 17085: 17085 = sql_packages 17087: 17087 = pg_toast_17085 17089: 17089 = pg_toast_17085_index 17090: 17090 = sql_sizing 17092: 17092 = pg_toast_17090 17094: 17094 = pg_toast_17090_index 17095: 17095 = sql_sizing_profiles 17097: 17097 = pg_toast_17095 17099: 17099 = pg_toast_17095_index 17246: 17246 = logfile_progress 17248: 17248 = pg_toast_17246 17250: 17250 = pg_toast_17246_index 17251: 17251 = servernames_id_seq 17253: 17253 = servernames 17257: 17257 = pg_toast_17253 17259: 17259 = pg_toast_17253_index 17260: 17260 = serveraliassen_id_seq 17262: 17262 = serveraliassen 17265: 17265 = pg_toast_17262 17267: 17267 = pg_toast_17262_index 17299: 17299 = bezoekers 17299.1: 17299 = bezoekers 17302: 17302 = pg_toast_17299 17304: 17304 = pg_toast_17299_index 17306: 17306 = providers_id_seq 17308: 17308 = providers 17311: 17311 = pg_toast_17308 17313: 17313 = pg_toast_17308_index 17314: 17314 = provider_domains_id_seq 17316: 17316 = provider_domains 17319: 17319 = pg_toast_17316 17321: 17321 = pg_toast_17316_index 17326: 17326 = asn_names 17328: 17328 = pg_toast_17326 17330: 17330 = pg_toast_17326_index 17334: 17334 = crawlers_id_seq 17336: 17336 = crawlers 17339: 17339 = pg_toast_17336 17341: 17341 = pg_toast_17336_index 17348: 17348 = browsers_id_seq 17350: 17350 = browsers 17353: 17353 = pg_toast_17350 17355: 17355 = pg_toast_17350_index 17359: 17359 = operating_systems_id_seq 17361: 17361 = operating_systems 17364: 17364 = pg_toast_17361 17366: 17366 = pg_toast_17361_index 17376: 17376 = landcodes 17378: 17378 = pg_toast_17376 17380: 17380 = pg_toast_17376_index 17393: 17393 = route_asn 17395: 17395 = ip_range_country 17398: 17398 = pg_toast_17395 17400: 17400 = pg_toast_17395_index 17401: 17401 = unknown_user_agents 17404: 17404 = pg_toast_17401 17406: 17406 = pg_toast_17401_index 17429: 17429 = route_asn_nexthop 17430: 17430 = route_asn_asn 17431: 17431 = ip_range_start_end 17434: 17434 = server_session 17435: 17435 = un_logfile_progress 17437: 17437 = servernames_pkey 17439: 17439 = servernames_name_key 17441: 17441 = serveraliassen_pkey 17443: 17443 = serveraliassen_alias_key 17445: 17445 = un_serveraliassen 17457: 17457 = un_session 17459: 17459 = providers_pkey 17461: 17461 = providers_name_key 17463: 17463 = provider_domains_pkey 17465: 17465 = provider_domains_domain_key 17469: 17469 = asn_names_pkey 17473: 17473 = crawlers_pkey 17475: 17475 = un_crawlers 17479: 17479 = browsers_pkey 17481: 17481 = un_browsers 17485: 17485 = operating_systems_pkey 17487: 17487 = un_operating_systems 17493: 17493 = landcodes_pkey 17499: 17499 = route_asn_pkey 17501: 17501 = ip_range_country_pkey
I suspect all the files with "No tables with that oid found" can be removed, but I don't know exactly how oid2name works, so I'll check them against pg_class before removing them.
> Also, did you get rid of the xlog temp file mentioned in the message?
It didn't exist anymore after the crash.
Thanks for the help! Sander.
---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
|
|
 | | From: | Sander Steffann | | Subject: | Re: Less available diskspace after crashed CLUSTER | | Date: | Sat, 15 Jan 2005 23:48:33 +0100 |
|
|
 | Hi,
Sorry for replying to myself, but I had to point out that:
>> I'm guessing that the new table/index files created during the CLUSTER >> didn't get released after the PANIC. Look for files that are not >> referenced by any relfilenode value in pg_class. > > Using the oid2name tool I get this: > [...] > I suspect all the files with "No tables with that oid found" can be > removed, but I don't know exactly how oid2name works, so I'll check them > against pg_class before removing them.
This is ofcourse not a good idea. There is quite a difference between the oid and the relfilenode. Comparing the output of: SELECT DISTINCT relfilenode FROM pg_class ORDER BY relfilenode; to the directory listing gave a much better result.
The files I found in this way also had an atime which was the same as the time of the crash.
Just to prevent someone else from making the mistake I made in my previous message...
Thanks! Sander.
---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend
|
|
|