Mail Delivery Subsystem messages --> /dev/null [Sendmail]

Prev: SYSERR(root): buildaddr: unknown mailer procmail
Next: host name lookup failure

From: Per Hedeland on 4 Feb 2007 19:05

In article <12s9j0qd9eov817(a)corp.supernews.com> "Alex Moen"
<alexm(a)ndtel.com> writes:
>
>Running /var/mqueue/worldq/l13FbuHW026989 (sequence 5 of 3751)
><boothltc.com(a)tabii.net>... Connecting to tabii.net. via esmtp...
>
>[root(a)ndtc3500 worldq]# nslookup -type=mx tabii.net
>Server: ns2.stellarnet.com
>Address: 66.163.128.15
>
>Authoritative answers can be found from:
>tabii.net
> origin = dns1.name-services.com
> mail addr = info.name-services.com
> serial = 2002050701
> refresh = 10001 (2h46m41s)
> retry = 1801 (30m1s)
> expire = 604801 (1w1s)
> minimum ttl = 181 (3m1s)
>
>I have never seen this type of answer before. Where is the
>non-authoritative answer?

It just means that there is no MX record, but there are others (e.g. A).
Generally nslookup is not a good tool for debugging DNS, it gets easily
confused and/or confusing for anything out of the ordinary - 'dig' is
far better, but it does require a bit more understanding of DNS from the
user.

$ dig mx tabii.net

; <<>> DiG 9.3.0 <<>> mx tabii.net
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 65310
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
....

So it shows a) no error and b) 0 answers (and there is of course no MX
record in the rest of the output. Trying again with A (which sendmail
will eventually do, though it may try AAAA first):

;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27013
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 5, ADDITIONAL: 0
....
;; ANSWER SECTION:
tabii.net. 1647 IN A 69.25.142.4

> Like you said, Per, a non-responding name server?

No, that problem would be if you got no response *at all* - not even one
containing 0 answers.:-) Generally this will show up as SERVFAIL "after
a while" when you use e.g. 'dig', since your local server will time out
and return such a response. But since this can be due to the original
query having got lost, the resolver has to retry - and it will do this
some number of times, for each server listed in your resolv.conf.

>If so, what is the fix? Is there a timeout for this type of situation,
>where if the DNS query is not returned within, say, 1 minute, to declare
>that the message is undeliverable and dump it?

There are lots of timeouts and max retries for DNS, you quoted them in
an earlier message - but a DNS failure will *never* cause a message to
be considered undeliverable per se, it will be retried until the queue
timeout. What you can control is the amount of time sendmail will spend
on each delivery attempt.

>In another case, the lookup was successful, but the process still sat there.
>A telnet to port 25 of the looked-up mailserver produced:
>[root(a)ndtc3500 mail]# telnet mail.drawaid.com 25
>Trying 65.106.66.254...
>telnet: Unable to connect to remote host: Connection timed out

So how long did it take for the telnet connection to time out? On my OS
(FreeBSD) it's 75 seconds, which is the "traditional" value - but pretty
excessive, since it's highly unlikely that the conection will succeed at
all if it hasn't in the first 15-20 seconds or so. On some Linuxen, the
default is on the order of 15 *minutes* or more. (On Linux this can be
controlled system-wide via /proc/sys/net/ipv4/tcp_syn_retries - wich
also allows for a ridiculously high setting.)

>Is there a timeout value for this instance as well? It seemed to take an
>awfully long time for the smtp process to timeout (I waited over 30 minutes
>for one of them before dumping the q and d files).

Yes, you quoted those too in an earlier message:

>>>#O Timeout.connect=5m
>>>O Timeout.connect=20s
>>>#O Timeout.aconnect=0s
>>>O Timeout.aconnect=20s
>>>O Timeout.iconnect=5s

- they're all described in doc/op/op.* in the distribution. The weird
thing was that even with the above settings in a "test" sendmail.cf, you
saw sendmail sit for an *hour* waiting for the connection to complete -
however looking back on that, I see that you weren't actually using that
sendmail.cf:

>>># /usr/lib/sendmail -oQ/var/mqueue/worldq -v -q -cf /etc/mail/sendmail.alex

The option too specify a different config file is -C, not -cf. -c is
some ancient boolean option that isn't even documented anymore, -f sets
the envelope sender, i.e. the above is an attempt to set the envelope
sender address to /etc/mail/sendmail.alex - which gets promptly ignored
due to the -q option:

# sendmail -v -q -cf /foo/bar
WARNING: Ignoring submission mode -f option (not in submission mode)

> I would think, that if
>the receiving mail server does not answer withing a minute or two, it should
>be considered either unreachable or non existant, and the mail deferred, or
>dropped if the queue timeout has been reached.

Yes, this is the norm.

> (An aside to this one: a
>visit to www.drawaid.com produced this: We are sorry to see you go. Please
>enter your email address below. Your address will be promptly removed from
>our mailing within 72 hours!... Hmm...)

Well I think you already knew that most all of your queued mail is spam
bounces... - and were already told that the real fix is to not accept
the spam for non-existent recipients in the first place.

>I am not sure what the best way would be to do a tcpdump on this, as it is
>so busy dealing with normal traffic. I thought about tcpdump -vv -s 600
>port smtp, but how do I follow the specific connection?

As already mentioned, giving just the remote host and no port is better
- however that may not give the whole truth either, e.g. if the problem
is with DNS. Sendmail debug flags may work better then. But if the
problem really is with the TCP connection, you should see SYNs go out
repeatedly while sendmail is waiting - and the above timeout settings
should help if you actually use them.

--Per Hedeland
per(a)hedeland.org

From: Alex Moen on 5 Feb 2007 09:39

"Per Hedeland" <per(a)hedeland.org> wrote in message
news:eq5sd2$14ij$1(a)hedeland.org...
> In article <12s9j0qd9eov817(a)corp.supernews.com> "Alex Moen"
> <alexm(a)ndtel.com> writes:
>>
>>Running /var/mqueue/worldq/l13FbuHW026989 (sequence 5 of 3751)
>><boothltc.com(a)tabii.net>... Connecting to tabii.net. via esmtp...
>>
>>[root(a)ndtc3500 worldq]# nslookup -type=mx tabii.net
>>Server: ns2.stellarnet.com
>>Address: 66.163.128.15
>>
>>Authoritative answers can be found from:
>>tabii.net
>> origin = dns1.name-services.com
>> mail addr = info.name-services.com
>> serial = 2002050701
>> refresh = 10001 (2h46m41s)
>> retry = 1801 (30m1s)
>> expire = 604801 (1w1s)
>> minimum ttl = 181 (3m1s)
>>
>>I have never seen this type of answer before. Where is the
>>non-authoritative answer?
>
> It just means that there is no MX record, but there are others (e.g. A).
> Generally nslookup is not a good tool for debugging DNS, it gets easily
> confused and/or confusing for anything out of the ordinary - 'dig' is
> far better, but it does require a bit more understanding of DNS from the
> user.

nslookup keeps telling me that too. :) Old habits die hard.

> $ dig mx tabii.net
>
> ; <<>> DiG 9.3.0 <<>> mx tabii.net
> ;; global options: printcmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 65310
> ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
> ...
>
> So it shows a) no error and b) 0 answers (and there is of course no MX
> record in the rest of the output. Trying again with A (which sendmail
> will eventually do, though it may try AAAA first):
>
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27013
> ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 5, ADDITIONAL: 0
> ...
> ;; ANSWER SECTION:
> tabii.net. 1647 IN A 69.25.142.4
>
>> Like you said, Per, a non-responding name server?
>
> No, that problem would be if you got no response *at all* - not even one
> containing 0 answers.:-) Generally this will show up as SERVFAIL "after
> a while" when you use e.g. 'dig', since your local server will time out
> and return such a response. But since this can be due to the original
> query having got lost, the resolver has to retry - and it will do this
> some number of times, for each server listed in your resolv.conf.

OK...

>>If so, what is the fix? Is there a timeout for this type of situation,
>>where if the DNS query is not returned within, say, 1 minute, to declare
>>that the message is undeliverable and dump it?
>
> There are lots of timeouts and max retries for DNS, you quoted them in
> an earlier message - but a DNS failure will *never* cause a message to
> be considered undeliverable per se, it will be retried until the queue
> timeout. What you can control is the amount of time sendmail will spend
> on each delivery attempt.
>
>>In another case, the lookup was successful, but the process still sat
>>there.
>>A telnet to port 25 of the looked-up mailserver produced:
>>[root(a)ndtc3500 mail]# telnet mail.drawaid.com 25
>>Trying 65.106.66.254...
>>telnet: Unable to connect to remote host: Connection timed out
>
> So how long did it take for the telnet connection to time out? On my OS
> (FreeBSD) it's 75 seconds, which is the "traditional" value - but pretty
> excessive, since it's highly unlikely that the conection will succeed at
> all if it hasn't in the first 15-20 seconds or so. On some Linuxen, the
> default is on the order of 15 *minutes* or more. (On Linux this can be
> controlled system-wide via /proc/sys/net/ipv4/tcp_syn_retries - wich
> also allows for a ridiculously high setting.)
>
>>Is there a timeout value for this instance as well? It seemed to take an
>>awfully long time for the smtp process to timeout (I waited over 30
>>minutes
>>for one of them before dumping the q and d files).
>
> Yes, you quoted those too in an earlier message:
>
>>>>#O Timeout.connect=5m
>>>>O Timeout.connect=20s
>>>>#O Timeout.aconnect=0s
>>>>O Timeout.aconnect=20s
>>>>O Timeout.iconnect=5s
>
> - they're all described in doc/op/op.* in the distribution. The weird
> thing was that even with the above settings in a "test" sendmail.cf, you
> saw sendmail sit for an *hour* waiting for the connection to complete -
> however looking back on that, I see that you weren't actually using that
> sendmail.cf:
>
>>>># /usr/lib/sendmail -oQ/var/mqueue/worldq -v -q -cf
>>>>/etc/mail/sendmail.alex
>
> The option too specify a different config file is -C, not -cf. -c is
> some ancient boolean option that isn't even documented anymore, -f sets
> the envelope sender, i.e. the above is an attempt to set the envelope
> sender address to /etc/mail/sendmail.alex - which gets promptly ignored
> due to the -q option:
>

OH MY GOD. I knew that too. Another "DUH" for me, please, put it on my
tab. I think this was my problem the entire time, I cannot believe that I
did that.

> # sendmail -v -q -cf /foo/bar
> WARNING: Ignoring submission mode -f option (not in submission mode)
>
>> I would think, that if
>>the receiving mail server does not answer withing a minute or two, it
>>should
>>be considered either unreachable or non existant, and the mail deferred,
>>or
>>dropped if the queue timeout has been reached.
>
> Yes, this is the norm.
>
>> (An aside to this one: a
>>visit to www.drawaid.com produced this: We are sorry to see you go. Please
>>enter your email address below. Your address will be promptly removed from
>>our mailing within 72 hours!... Hmm...)
>
> Well I think you already knew that most all of your queued mail is spam
> bounces... - and were already told that the real fix is to not accept
> the spam for non-existent recipients in the first place.
>
>>I am not sure what the best way would be to do a tcpdump on this, as it is
>>so busy dealing with normal traffic. I thought about tcpdump -vv -s 600
>>port smtp, but how do I follow the specific connection?
>
> As already mentioned, giving just the remote host and no port is better
> - however that may not give the whole truth either, e.g. if the problem
> is with DNS. Sendmail debug flags may work better then. But if the
> problem really is with the TCP connection, you should see SYNs go out
> repeatedly while sendmail is waiting - and the above timeout settings
> should help if you actually use them.
>
> --Per Hedeland
> per(a)hedeland.org

OK, so, changing my command line to a WORKING one referencing my test config
with the shorter timeouts, it seems to be running through the queue nicely.

I'm gonna run like this a couple of days to see how it works out. We have
been running the original timeouts for many years, and it has just started
to be a problem.

Thank you guys so much for the help. I was at my wits end here.

From: Per Hedeland on 5 Feb 2007 17:20

In article <12segdkgcfdhjec(a)corp.supernews.com> "Alex Moen"
<alexm(a)ndtel.com> writes:
>
>"Per Hedeland" <per(a)hedeland.org> wrote in message
>news:eq5sd2$14ij$1(a)hedeland.org...
>>
>>>>># /usr/lib/sendmail -oQ/var/mqueue/worldq -v -q -cf
>>>>>/etc/mail/sendmail.alex
>>
>> The option too specify a different config file is -C, not -cf. -c is
>> some ancient boolean option that isn't even documented anymore, -f sets
>> the envelope sender, i.e. the above is an attempt to set the envelope
>> sender address to /etc/mail/sendmail.alex - which gets promptly ignored
>> due to the -q option:
>>
>
>OH MY GOD. I knew that too. Another "DUH" for me, please, put it on my
>tab. I think this was my problem the entire time, I cannot believe that I
>did that.

On the upside, you were wise enough to cut and paste verbatim what you
did and the results, instead of having us playing guessing games.:-)

--Per Hedeland
per(a)hedeland.org

First | Prev |
Pages: 1 2 3 4 5
Prev: SYSERR(root): buildaddr: unknown mailer procmail
Next: host name lookup failure