Mail Delivery Subsystem messages --> /dev/null [Sendmail]

Prev: SYSERR(root): buildaddr: unknown mailer procmail
Next: host name lookup failure

From: jmaimon on 1 Feb 2007 19:36

On Feb 1, 7:46 am, "Alex Moen" <a...(a)ndtel.com> wrote:
> OK... Here goes:
>
> <jmai...(a)ttec.com> wrote in message
>
> news:1170291035.950160.293210(a)s48g2000cws.googlegroups.com...
>
>

>
>
> I have tried milters in the past and they created more problems than they
> solved. For some reason, we had issues using milter on Solaris 8, where the
> software would create 3 files for each incoming message, and only delete 2,

callahead-milter does not create any files per mail. Odds are neither
do the others.

Sendmail? Perhaps you should look at your SuperSafe option.

What version of sendmail, what milters and versions were you trying?

Perhaps you should try using a seperate milter host running on modern
hardware and modern software. I expect you would have equally good
results with Debian GNU/Linux 4.0 or any current BSD.

> filling up the /var filesystem in a matter of days and destroying our Labor
> Day holiday. Milter is no longer, and never will be, an option in this
> office

Sounds like cutting off your nose to spite your face. Milter is the
only real way to access the full potential of sendmail in a modern
mail system.

From: Per Hedeland on 2 Feb 2007 20:02

In article <12s4sq1qvgnti59(a)corp.supernews.com> "Alex Moen"
<alexm(a)ndtel.com> writes:
>
>"Grant Taylor" <gtaylor(a)riverviewtech.net> wrote in message
>news:mailman.142.1170370768.28999.comp.mail.sendmail(a)maillists.riverviewtech.net...
>> Alex Moen wrote:
>>> Running /var/mqueue/worldq/l11LGHEN020602 (sequence 6 of 775)
>>> <1234567bwyw(a)email.lu>... Connecting to mail.email.lu. via esmtp...
>>> 220 free.email.lu ESMTP Unity v1.0 TestPhase
>>> >>> EHLO ndtc3500.stellarnet.com
>>> 250-free.email.lu
>>> 250-PIPELINING
>>> 250-SIZE 20971520
>>> 250-VRFY
>>> 250-ETRN
>>> 250-STARTTLS
>>> 250-AUTH LOGIN PLAIN
>>> 250-AUTH=LOGIN PLAIN
>>> 250 8BITMIME
>>> >>> MAIL From:<> SIZE=3074 BODY=8BITMIME
>>> 250 Ok
>>> >>> RCPT To:<1234567bwyw(a)email.lu>
>>> >>> DATA
>>> 450 <1234567bwyw(a)email.lu>: Recipient address rejected: User unknown in
>>> local recipient table
>>
>> Um, why is the server Temp Failing for an unknown / invalid recipient?

Good question - I think I saw in another thread reasoning to the effect
that "our mail system is always broken, so we always tempfail since it
gives us the opportunity to apply a band-aid for each incorrectly
non-accepted message, and accept it at a later try without the sender
noticing the brokenness (much)". Well, maybe those weren't the exact
words...

>> Furthermore why is it doing so after the DATA phase, not after the RCPT
>> phase?

It's not after the DATA phase (then there would would have been a
line-with-a-dot-alone in the transcript), but after the DATA *command*.
This is due to PIPELINING.

>>> <1234567bwyw(a)email.lu>... Deferred: 450 <1234567bwyw(a)email.lu>: Recipient
>>> address rejected: User unknown in local recipient table
>>> 554 Error: no valid recipients
>>> >>> RSET
>>> 250 Ok
>>> <1234567bwyw(a)email.lu>... Connecting to plop.gms.lu. via esmtp...
>>
>> This is as I would expect.
>>
>>> Why, after sendmail got the 554 error, did it retry connecting

The 554 is an inevitable result of PIPELINING when all recipients were
temp- *or* perm-rejected - it's not relevant to the fate of the message,
only the per-recipient rejections are.

>> With the fact that email.lu and plop.gms.lu are returning Temp Fails on
>> unknown recipients, Sendmail will keep retrying to send messages until
>> they expire, I think. This in and of its self may be why messages never
>> left the mail queue.

Well, as you say, they should only be retried "until they expire" (2
days in this setup I believe), so it doesn't explain "never".

>> Per Hedeland: Do you have any thing to add to this / correct me?

Huh? Why me?:-)

>So, if Sendmail will keep trying until they expire, that would be reflected
>in the setting I have for one of the queuereturns, which I have set for 2
>days, right?

It should, provided that your sendmail is doing its periodic queue runs
(normally initiated by the MTA daemon based on the -q<time> commandline
argument). But if your queue runs get stuck on many messages like the
one you mentioned earlier, they may not be all that effective. You
should try to investigate why that happens, i.e. what sendmail is
actually waiting for in those cases - e.g. no normal OS TCP/IP stack is
prepared to wait for an hour for the TCP connection to succeed, but
(e.g.) having lots of non-responding name servers (or lots of responding
name server that try to talk to the same non-responding one) configred
could make a DNS lookup attempt take almost forever. Packet sniffing
using e.g. tcpdump may be a useful first step, or running sendmail with
more debugging on - -d8.8 will show DNS lookups, -d1-99.9 will show
everything but still not at the max level of detail.

> So, can I change the behavior to dump immediately in the case
>of unknown recipients, and have it drop them immediately on the unknown?

A MTA is not allowed to base its actions on the *text* of the responses,
only the return code. All 4xx codes mean "didn't work now, try later",
450 specifically means

450 Requested mail action not taken: mailbox unavailable
(e.g., mailbox busy)

which is assumed to be a transient condition, as opposed to

550 Requested action not taken: mailbox unavailable
(e.g., mailbox not found, no access, or command rejected
for policy reasons)

- it would be broken to bounce a message on the first receipt of 450
("drop" or "dump" would be even more broken). But of course the source
is available for modification...

--Per Hedeland
per(a)hedeland.org

From: Grant Taylor on 2 Feb 2007 23:20

On 02/02/07 19:02, Per Hedeland wrote:
> Good question - I think I saw in another thread reasoning to the effect
> that "our mail system is always broken, so we always tempfail since it
> gives us the opportunity to apply a band-aid for each incorrectly
> non-accepted message, and accept it at a later try without the sender
> noticing the brokenness (much)". Well, maybe those weren't the exact
> words...

Oh, wow! I have not had enough alcohol for that answer.

> It's not after the DATA phase (then there would would have been a
> line-with-a-dot-alone in the transcript), but after the DATA *command*.
> This is due to PIPELINING.

*nod* I never use PIPELINING when I'm testing things manually, so I
would not know.

> The 554 is an inevitable result of PIPELINING when all recipients were
> temp- *or* perm-rejected - it's not relevant to the fate of the message,
> only the per-recipient rejections are.

*nod*

> Well, as you say, they should only be retried "until they expire" (2
> days in this setup I believe), so it doesn't explain "never".

That is what I thought.

> Huh? Why me?:-)

B/c I like picking on someone that has convinced me that they know more
about a subject than I believe I know about it. IMHO, you are "Guilty
as charged.".

> - it would be broken to bounce a message on the first receipt of 450
> ("drop" or "dump" would be even more broken). But of course the source
> is available for modification...

We all know and some of us have to work with MTAs, or at least things
that claim to be MTAs, that are even more broken than that.

Grant. . . .

From: Alex Moen on 3 Feb 2007 12:53

"Per Hedeland" <per(a)hedeland.org> wrote in message
news:eq0mvd$2l6d$3(a)hedeland.org...
> In article <12s4sq1qvgnti59(a)corp.supernews.com> "Alex Moen"
> <alexm(a)ndtel.com> writes:
>>
>>"Grant Taylor" <gtaylor(a)riverviewtech.net> wrote in message
>>news:mailman.142.1170370768.28999.comp.mail.sendmail(a)maillists.riverviewtech.net...
>>> Alex Moen wrote:
>>>> Running /var/mqueue/worldq/l11LGHEN020602 (sequence 6 of 775)
>>>> <1234567bwyw(a)email.lu>... Connecting to mail.email.lu. via esmtp...
>>>> 220 free.email.lu ESMTP Unity v1.0 TestPhase
>>>> >>> EHLO ndtc3500.stellarnet.com
>>>> 250-free.email.lu
>>>> 250-PIPELINING
>>>> 250-SIZE 20971520
>>>> 250-VRFY
>>>> 250-ETRN
>>>> 250-STARTTLS
>>>> 250-AUTH LOGIN PLAIN
>>>> 250-AUTH=LOGIN PLAIN
>>>> 250 8BITMIME
>>>> >>> MAIL From:<> SIZE=3074 BODY=8BITMIME
>>>> 250 Ok
>>>> >>> RCPT To:<1234567bwyw(a)email.lu>
>>>> >>> DATA
>>>> 450 <1234567bwyw(a)email.lu>: Recipient address rejected: User unknown in
>>>> local recipient table
>>>
>>> Um, why is the server Temp Failing for an unknown / invalid recipient?
>
> Good question - I think I saw in another thread reasoning to the effect
> that "our mail system is always broken, so we always tempfail since it
> gives us the opportunity to apply a band-aid for each incorrectly
> non-accepted message, and accept it at a later try without the sender
> noticing the brokenness (much)". Well, maybe those weren't the exact
> words...

Very eloquent! :)

>>> Furthermore why is it doing so after the DATA phase, not after the RCPT
>>> phase?
>
> It's not after the DATA phase (then there would would have been a
> line-with-a-dot-alone in the transcript), but after the DATA *command*.
> This is due to PIPELINING.
>
>>>> <1234567bwyw(a)email.lu>... Deferred: 450 <1234567bwyw(a)email.lu>:
>>>> Recipient
>>>> address rejected: User unknown in local recipient table
>>>> 554 Error: no valid recipients
>>>> >>> RSET
>>>> 250 Ok
>>>> <1234567bwyw(a)email.lu>... Connecting to plop.gms.lu. via esmtp...
>>>
>>> This is as I would expect.
>>>
>>>> Why, after sendmail got the 554 error, did it retry connecting
>
> The 554 is an inevitable result of PIPELINING when all recipients were
> temp- *or* perm-rejected - it's not relevant to the fate of the message,
> only the per-recipient rejections are.
>
>>> With the fact that email.lu and plop.gms.lu are returning Temp Fails on
>>> unknown recipients, Sendmail will keep retrying to send messages until
>>> they expire, I think. This in and of its self may be why messages never
>>> left the mail queue.
>
> Well, as you say, they should only be retried "until they expire" (2
> days in this setup I believe), so it doesn't explain "never".

I think the "never" comes in due to the number of queued messages, and the
fact that the queue run doesn't make it to the old messages before
breaking... See my addition below.

>>> Per Hedeland: Do you have any thing to add to this / correct me?
>
> Huh? Why me?:-)

Who better???

>>So, if Sendmail will keep trying until they expire, that would be
>>reflected
>>in the setting I have for one of the queuereturns, which I have set for 2
>>days, right?
>
> It should, provided that your sendmail is doing its periodic queue runs
> (normally initiated by the MTA daemon based on the -q<time> commandline
> argument). But if your queue runs get stuck on many messages like the
> one you mentioned earlier, they may not be all that effective. You
> should try to investigate why that happens, i.e. what sendmail is
> actually waiting for in those cases - e.g. no normal OS TCP/IP stack is
> prepared to wait for an hour for the TCP connection to succeed, but
> (e.g.) having lots of non-responding name servers (or lots of responding
> name server that try to talk to the same non-responding one) configred
> could make a DNS lookup attempt take almost forever. Packet sniffing
> using e.g. tcpdump may be a useful first step, or running sendmail with
> more debugging on - -d8.8 will show DNS lookups, -d1-99.9 will show
> everything but still not at the max level of detail.

OK, some new information about this case... The previous paragraph got me
thinking about what was happening on quite a few of the queue files. So, I
manually ran the queue, found one that was hanging up, and started checking
things out. I did this on 3 different instances, and found that the lookup
for an MX record is returning a very strange answer. For instance:

Running /var/mqueue/worldq/l13FbuHW026989 (sequence 5 of 3751)
<boothltc.com(a)tabii.net>... Connecting to tabii.net. via esmtp...

[root(a)ndtc3500 worldq]# nslookup -type=mx tabii.net
Server: ns2.stellarnet.com
Address: 66.163.128.15

Authoritative answers can be found from:
tabii.net
origin = dns1.name-services.com
mail addr = info.name-services.com
serial = 2002050701
refresh = 10001 (2h46m41s)
retry = 1801 (30m1s)
expire = 604801 (1w1s)
minimum ttl = 181 (3m1s)

I have never seen this type of answer before. Where is the
non-authoritative answer? Like you said, Per, a non-responding name server?
If so, what is the fix? Is there a timeout for this type of situation,
where if the DNS query is not returned within, say, 1 minute, to declare
that the message is undeliverable and dump it?

In another case, the lookup was successful, but the process still sat there.
A telnet to port 25 of the looked-up mailserver produced:
[root(a)ndtc3500 mail]# telnet mail.drawaid.com 25
Trying 65.106.66.254...
telnet: Unable to connect to remote host: Connection timed out

Is there a timeout value for this instance as well? It seemed to take an
awfully long time for the smtp process to timeout (I waited over 30 minutes
for one of them before dumping the q and d files). I would think, that if
the receiving mail server does not answer withing a minute or two, it should
be considered either unreachable or non existant, and the mail deferred, or
dropped if the queue timeout has been reached. (An aside to this one: a
visit to www.drawaid.com produced this: We are sorry to see you go. Please
enter your email address below. Your address will be promptly removed from
our mailing within 72 hours!... Hmm...) Being able to shorten this timeout
will allow the queue to run faster, and maybe get it to clear out the older
ones more quickly and keep the number of queued mail to a managable
number... Which is, at this writing, around 3750 (and almost 7800 files).

I am not sure what the best way would be to do a tcpdump on this, as it is
so busy dealing with normal traffic. I thought about tcpdump -vv -s 600
port smtp, but how do I follow the specific connection?

>> So, can I change the behavior to dump immediately in the case
>>of unknown recipients, and have it drop them immediately on the unknown?
>
> A MTA is not allowed to base its actions on the *text* of the responses,
> only the return code. All 4xx codes mean "didn't work now, try later",
> 450 specifically means
>
> 450 Requested mail action not taken: mailbox unavailable
> (e.g., mailbox busy)
>
> which is assumed to be a transient condition, as opposed to
>
> 550 Requested action not taken: mailbox unavailable
> (e.g., mailbox not found, no access, or command rejected
> for policy reasons)
>
> - it would be broken to bounce a message on the first receipt of 450
> ("drop" or "dump" would be even more broken). But of course the source
> is available for modification...

OK, I understand that.

Thanks for the help on this... Hopefully we can fix this. Let me know if
there's anything else I can supply to help troubleshoot.

From: Erik Warmelink on 3 Feb 2007 20:44

In article <12s9j0qd9eov817(a)corp.supernews.com>,
"Alex Moen" <alexm(a)ndtel.com> writes:

> I am not sure what the best way would be to do a tcpdump on this, as it is
> so busy dealing with normal traffic. I thought about tcpdump -vv -s 600
> port smtp, but how do I follow the specific connection?

tcpdump ip host [IP address|host name]

Not just smtp: ICMP, auth, netbios-ns &c. might help too.

--
erik(a)selwerd.nl

First | Prev | Next | Last
Pages: 1 2 3 4 5
Prev: SYSERR(root): buildaddr: unknown mailer procmail
Next: host name lookup failure