Prev: dnswl doesn't work?
Next: why no configure script?
From: Dominik Storck on 4 Aug 2010 04:39 Hello, I have a problem here with mailing lists. I've configured these as entries in the /etc/aliases like group-x: :include:/etc/group-x.list and corresponding files /etc/group-x.list with one line per (local) recipient This has been working perfectly for years. Now the number of recipients for some of these lists have increased to more than 200. When a mail is sent to these recipients mail delivery starts as expected but stops short before end of list. The exact count changes, probably due to different state of of concurrent mail queue entries. The error message is an "unknown mail transport error", the mail stays in the queue and delivery starts over again from the beginnig until I remove the mail from the queue. I believe there is some limit to 200 recipients, queue entries or whatever. Playing with parameters like default_destination_concurrency_limit = 300 default_destination_recipient_limit = 300 smtp_recipient_limit = 500 did not produce any visible effect. Any ideas would be appreciated. Thanks Dominik here is an extract from the log ================================================================== Aug 2 14:50:49 postfix postfix/pickup[27603]: 720441D34008: uid=0 from=<root> Aug 2 14:50:49 postfix postfix/cleanup[28372]: 720441D34008: message-id=<20100802125049.720441D34008(a)postfix.meinedomain.com> Aug 2 14:50:49 postfix postfix/qmgr[904]: 720441D34008: from=<root(a)postfix.meinedomain.com>, size=577, nrcpt=1 (queue active) Aug 2 14:50:49 postfix postfix/local[29097]: 720441D34008: to=<aba(a)postfix.meinedomain.com>, orig_to=<all(a)meinedomain.com>, relay=local, delay=0.34, delays=0.23/0/0/0.11, dsn=2.0.0, status=sent (delivered to maildir) .... [211 lines removed, differing only in recipient address and increasing delay] .... Aug 2 14:52:06 postfix postfix/local[29097]: 720441D34008: to=<wie(a)postfix.meinedomain.com>, orig_to=<all(a)meinedomain.com>, relay=local, delay=77, delays=0.23/0/0/77, dsn=2.0.0, status=sent (delivered to command: /usr/bin/vacation -j wie) Aug 2 14:52:06 postfix postfix/local[29097]: 720441D34008: to=<wol(a)postfix.meinedomain.com>, orig_to=<all(a)meinedomain.com>, relay=local, delay=77, delays=0.23/0/0/77, dsn=2.0.0, status=sent (delivered to maildir) Aug 2 14:52:06 postfix postfix/local[29097]: 720441D34008: to=<wre(a)postfix.meinedomain.com>, orig_to=<all(a)meinedomain.com>, relay=local, delay=77, delays=0.23/0/0/77, dsn=2.0.0, status=sent (delivered to maildir) Aug 2 14:52:08 postfix postfix/error[32270]: 720441D34008: to=<all(a)meinedomain.com>, relay=none, delay=79, delays=0.23/79/0/0.67, dsn=4.3.0, status=deferred (unknown mail transport error) ================================================================== postconf -n: ================================================================== alias_maps = hash:/etc/aliases biff = no canonical_maps = hash:/etc/postfix/canonical command_directory = /usr/sbin config_directory = /etc/postfix daemon_directory = /usr/lib/postfix data_directory = /var/lib/postfix debug_peer_level = 2 defer_transports = disable_dns_lookups = no disable_mime_output_conversion = no header_checks = regexp:/etc/postfix/header_checks home_mailbox = Maildir/ html_directory = /usr/share/doc/packages/postfix/html inet_interfaces = all inet_protocols = all local_destination_concurrency_limit = 2 mail_owner = postfix mail_spool_directory = /var/mail mailbox_command = mailbox_size_limit = 0 mailbox_transport = mailq_path = /usr/bin/mailq manpage_directory = /usr/share/man masquerade_classes = envelope_sender, header_sender, header_recipient masquerade_domains = masquerade_exceptions = root message_size_limit = 1024000000 mydestination = $myhostname localhost.$mydomain my-domain.com myhostname = postfix.my-domain.com mynetworks_style = subnet newaliases_path = /usr/bin/newaliases queue_directory = /var/spool/postfix readme_directory = /usr/share/doc/packages/postfix/README_FILES relayhost = relocated_maps = hash:/etc/postfix/relocated sample_directory = /usr/share/doc/packages/postfix/samples sender_canonical_maps = hash:/etc/postfix/sender_canonical sendmail_path = /usr/sbin/sendmail setgid_group = maildrop smtp_sasl_auth_enable = no smtp_use_tls = yes smtpd_client_connection_count_limit = 0 smtpd_client_restrictions = smtpd_helo_required = yes smtpd_helo_restrictions = permit_mynetworks, check_sender_access hash:/etc/postfix/access, check_client_access hash:/etc/postfix/access,reject_non_fqdn_hostname, permit smtpd_recipient_restrictions = permit_sasl_authenticated,check_sender_access hash:/etc/postfix/access,check_client_access hash:/etc/postfix/access,reject_unauth_destination,reject_non_fqdn_sender,reject_non_fqdn_recipient, reject_unverified_recipient,reject_unauth_destination,permit smtpd_sasl_auth_enable = no smtpd_sender_restrictions = permit_mynetworks, check_sender_access hash:/etc/postfix/access, permit smtpd_use_tls = no strict_8bitmime = no strict_rfc821_envelopes = no transport_maps = hash:/etc/postfix/transport unknown_local_recipient_reject_code = 550 virtual_alias_domains = hash:/etc/postfix/virtual virtual_alias_maps = hash:/etc/postfix/virtual ==================================================================
From: Wietse Venema on 4 Aug 2010 12:02 See http://www.postfix.org/DEBUG_README.html#logging When Postfix does not receive or deliver mail, the first order of business is to look for errors that prevent Postfix from working properly: % egrep '(warning|error|fatal|panic):' /some/log/file | more Note: the most important message is near the BEGINNING of the output. Error messages that come later are less useful. The nature of each problem is indicated as follows: * "panic" indicates a problem in the software itself that only a programmer can fix. Postfix cannot proceed until this is fixed. * "fatal" is the result of missing files, incorrect permissions, incorrect configuration file settings that you can fix. Postfix cannot proceed until this is fixed. * "error" reports an error condition. For safety reasons, a Postfix process will terminate when more than 13 of these happen. * "warning" indicates a non-fatal error. These are problems that you may not be able to fix (such as a broken DNS server elsewhere on the network) but may also indicate local configuration errors that could become a problem later.
From: =?UTF-8?Q?Andrzej_Kuku=C5=82a?= on 5 Aug 2010 06:58 On Wed, Aug 4, 2010 at 10:39, Dominik Storck <dominik(a)storck.net> wrote: > > This has been working perfectly for years. Now the number of recipients > for some of > these lists have increased to more than 200. > > When a mail is sent to these recipients mail delivery starts as expected > but stops > short before end of list. The exact count changes, probably due to different > state of of concurrent mail queue entries. > > The error message is an "unknown mail transport error", the mail stays > in the queue and > delivery starts over again from the beginnig until I remove the mail > from the queue. > > I believe there is some limit to 200 recipients, queue entries or whatever. I'd speculate it's low open file limit in operating system. I had this once when my 'everyone' alias exceeded several hundred users. See ulimit -n Increase it in your postfix startup script to, say, 100000, and observe the difference. Regards, Andrzej
From: Dominik Storck on 9 Aug 2010 14:25 Hello Andrzej, thanks for your suggestion. However, increasing ulimit -n did not help in the first place . Debugging info by adding -v in master.cf came up with lots of debugging info yet not leading to success. After attaching strace to the local daemon, I observed a kind of a loop when it came to evaluate the .forward files in the users home directories, several of them in the form /home/username/.forward = "\username". These files are produced by a vacation plugin to our squirrelmail web access to the mailboxes, when the users disable forwarding and or vacation messages. After sending mails to almost all of the recipients - many of them have these .forward-files, evaluated without any problem - strace showed up repeated accesses to the .forward file for the same user about 4300 (!) times - probably until the above ulimit is touched and the process segfaults. As I didn't figure out by now what causes the loop, I'd appreciate any other idea. Here is a snippet of the loops output, lots of similar block before and after this one: The only parameter that changes is the 4304 here, which i believe to be the file handle. This number is incremented with each loop iteration. ======================================================= Aug 9 12:06:05 postfix logger: lstat64("/home/wre/.forward",{st_mode=S_IFREG|0600, st_size=6, ...}) = 0 Aug 9 12:06:05 postfix logger: geteuid32() = 1193 Aug 9 12:06:05 postfix logger: setresuid32(-1, 0, -1) = 0 Aug 9 12:06:05 postfix logger: setresgid32(-1, 51, -1) = 0 Aug 9 12:06:05 postfix logger: setgroups32(1, [51]) = 0 Aug 9 12:06:05 postfix logger: setresuid32(-1, 51, -1) = 0 Aug 9 12:06:05 postfix logger: geteuid32() = 51 Aug 9 12:06:05 postfix logger: getegid32() = 51 Aug 9 12:06:05 postfix logger: geteuid32() = 51 Aug 9 12:06:05 postfix logger: setresuid32(-1, 0, -1) = 0 Aug 9 12:06:05 postfix logger: setresgid32(-1, 100, -1) = 0 Aug 9 12:06:05 postfix logger: setgroups32(1, [100]) = 0 Aug 9 12:06:05 postfix logger: setresuid32(-1, 1193, -1) = 0 Aug 9 12:06:05 postfix logger: open("/home/wre/.forward", O_RDONLY) = 4304 Aug 9 12:06:05 postfix logger: geteuid32() = 1193 Aug 9 12:06:05 postfix logger: setresuid32(-1, 0, -1) = 0 Aug 9 12:06:05 postfix logger: setresgid32(-1, 51, -1) = 0 Aug 9 12:06:05 postfix logger: setgroups32(1, [51]) = 0 Aug 9 12:06:05 postfix logger: setresuid32(-1, 51, -1) = 0 Aug 9 12:06:05 postfix logger: fcntl64(4304, F_GETFD) = 0 Aug 9 12:06:05 postfix logger: fcntl64(4304, F_SETFD, FD_CLOEXEC) = 0 Aug 9 12:06:05 postfix logger: read(4304, "\\wre\r\n", 4096) = 6 Aug 9 12:06:05 postfix logger: time(NULL) = 1281348364 Aug 9 12:06:05 postfix logger: time(NULL) = 1281348364 Aug 9 12:06:05 postfix logger: geteuid32() = 51 Aug 9 12:06:05 postfix logger: getegid32() = 51 Aug 9 12:06:05 postfix logger: geteuid32() = 51 Aug 9 12:06:05 postfix logger: setresuid32(-1, 0, -1) = 0 Aug 9 12:06:05 postfix logger: setresgid32(-1, 100, -1) = 0 Aug 9 12:06:05 postfix logger: setgroups32(1, [100]) = 0 Aug 9 12:06:05 postfix logger: setresuid32(-1, 1193, -1) = 0 ============================================================================ For now, after deleting all the "quasi empty" forward-files seems to solve the problem, but I fear running into the same thing when the number of recipients will increase. Dominik Am 05.08.2010 12:58, schrieb Andrzej KukuĊa: > On Wed, Aug 4, 2010 at 10:39, Dominik Storck <dominik(a)storck.net> wrote: >> >> This has been working perfectly for years. Now the number of recipients >> for some of >> these lists have increased to more than 200. >> >> When a mail is sent to these recipients mail delivery starts as expected >> but stops >> short before end of list. The exact count changes, probably due to different >> state of of concurrent mail queue entries. >> >> The error message is an "unknown mail transport error", the mail stays >> in the queue and >> delivery starts over again from the beginnig until I remove the mail >> from the queue. >> >> I believe there is some limit to 200 recipients, queue entries or whatever. > > I'd speculate it's low open file limit in operating system. I had this > once when my 'everyone' alias exceeded several hundred users. See > ulimit -n > Increase it in your postfix startup script to, say, 100000, and > observe the difference. > > Regards, > Andrzej
From: Wietse Venema on 9 Aug 2010 16:17
In case you wonder, I wrote Postfix. Perhaps you can follow instructions in http://www.postfix.org/.DEBUG_README.html#logging TURN OFF -v logging before you do this. Wietse |