Unix Technical Forum

Odd sendmail problem

This is a discussion on Odd sendmail problem within the Sco Unix forums, part of the Unix Operating Systems category; --> (SCO:Unix::5.0.4Eb rs.Unix504.0.1.a oss601a.Unix504 SCO:SendMail::8.8.5c) Howdy. An old system which has been working fine (up until recently, no hardware changes, ...


Go Back   Unix Technical Forum > Unix Operating Systems > Sco Unix

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-15-2008, 11:49 AM
Stuart J. Browne
 
Posts: n/a
Default Odd sendmail problem

(SCO:Unix::5.0.4Eb rs.Unix504.0.1.a oss601a.Unix504 SCO:SendMail::8.8.5c)

Howdy.

An old system which has been working fine (up until recently, no hardware
changes, and only unrelated software changes (non-os, just custom app)) has
had it's sendmail daemon starting to act funny.

The only tell-tail signs are that the load average sky rockets,
/usr/spool/mqueue/ (/var/opt/K/SCO/SendMail/8.8.5c/spool/mqueue/) becomes
inaccessable to 'ls' etc., and thus mail flow stops.

A reboot of the machine returns it to normal usage again for some time, but
it does it again 2-3 days later.

I thought that maybe some garbage message was getting into the queue, but
have no awy of verifying this.

Two things I'd like to know:

Is there any way to find out what is under that directory structure
without relying upon the 'ls' and 'find' tools. Find shows the first 2
entries, but then stops.

Beyond upgrading everything (not an option at this stage), any other
ideas about how to combat this?

The system is in a protected network, so it is highly doubtful that someone
is attacking this from the outside (it is not in any name servers, and has
to go through a firewall to get there).

Firewall logs don't show anything out of normal.

For the moment, I've tried ruling out an issue with the symlink, and just
made a bare directory, hopefully leaving the old contents intact (after a
reboot).

bkx


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 02-15-2008, 11:50 AM
Bill Vermillion
 
Posts: n/a
Default Re: Odd sendmail problem

In article <3f89de9f@dnews.tpgi.com.au>,
Stuart J. Browne <stuart@promed.com.au> wrote:
>(SCO:Unix::5.0.4Eb rs.Unix504.0.1.a oss601a.Unix504 SCO:SendMail::8.8.5c)
>
>Howdy.


>An old system which has been working fine (up until recently, no
>hardware changes, and only unrelated software changes (non-os,
>just custom app)) has had it's sendmail daemon starting to act
>funny.


Are you 100% sure that the non-os custom ap isn't contributing to
the problem. Some applications have been known to send mail - you
know like LP does when a job is canceled. Sending mail to
an unknown user on the system can cause a lot of 'undelivered
messages'.

>The only tell-tail signs are that the load
>average sky rockets, /usr/spool/mqueue/
>(/var/opt/K/SCO/SendMail/8.8.5c/spool/mqueue/) becomes
>inaccessable to 'ls' etc., and thus mail flow stops.


>A reboot of the machine returns it to normal usage again for
>some time, but it does it again 2-3 days later.


Next time don't reboot the machine but kill sendmail. The go
look at the messages in the queue. What does the ouptut of
'mailq' tell you. Look at the headers and see if you can spot the
problem message.

>I thought that maybe some garbage message was getting into the
>queue, but have no awy of verifying this.


If it cures everytime you restart and don't find the problem before
you restart you are going to be having this until you determine
what is causing it.

>Two things I'd like to know:


> Is there any way to find out what is under that directory structure
>without relying upon the 'ls' and 'find' tools. Find shows the first 2
>entries, but then stops.


So what does 'ls' give you. If it's too slow you probably have
thousands in the queue and sorting them is going to take time so
just type echo *. Output will be a mass all on one line, but
pipe it through wc and you'll get an idea.

> Beyond upgrading everything (not an option at this stage),
>any other ideas about how to combat this?


Why do you think upgrading witll cure this if you haven't found
the problem? I'm not being facetious. You say the only change
was a non-os application. If everthing was find until then it
surely points a finger at that application. Of course if many
people have root access anything could be wrong. And an upgrade
instead of a complete reinstall could carry the problem forward.

>The system is in a protected network, so it is highly doubtful
>that someone is attacking this from the outside (it is not
>in any name servers, and has to go through a firewall to get
>there).


I also doubt that is a problem.

>Firewall logs don't show anything out of normal.


You are looking at the wrong logs. Look at the sendmail logs.
Your clue should be there.

>For the moment, I've tried ruling out an issue with the symlink,
>and just made a bare directory, hopefully leaving the old
>contents intact (after a reboot).


"hopefully?"

And what 'bare directory' did you make?

Unix systems DO NOT need to be rebooted in my experience unless
it's something that has to do with a kernel related problem. That's
based on 20 years with *n*x systems.

I've used sendmail as the MTA for two ISPs and it just doesn't
problems as you have described. I really bet it is NOT sendmail
acting funny but something else causing it to accumulate tonnes of
undeliverable messages.

Bill
--
Bill Vermillion - bv @ wjv . com
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 02-15-2008, 11:50 AM
Jean-Pierre Radley
 
Posts: n/a
Default Re: Odd sendmail problem

Bill Vermillion typed (on Tue, Oct 14, 2003 at 07:25:01PM +0000):
| In article <3f89de9f@dnews.tpgi.com.au>,
| Stuart J. Browne <stuart@promed.com.au> wrote:
| >(SCO:Unix::5.0.4Eb rs.Unix504.0.1.a oss601a.Unix504 SCO:SendMail::8.8.5c)
|
| >Two things I'd like to know:
|
| > Is there any way to find out what is under that directory structure
| >without relying upon the 'ls' and 'find' tools. Find shows the first 2
| >entries, but then stops.

I suspect you're describing what happens when you run a command like

find /some/path | xargs grep "some string"

and it turns out that there is a pipe-file under /some/path. It is
the 'grep' that is apparently stopping -- but it isn't stopping, it'll
eternally read such a FIFO looking for "some string", until you lose
patience and kill the command.

If that's what's happening, then run:

find /some/path ! -type p | xargs grep "some string"

or better, to avoid the error you get when you try to grep a directory:

find /some/path -type f | xargs grep "some string"

--
JP
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 02-15-2008, 11:51 AM
Stuart J. Browne
 
Posts: n/a
Default Re: Odd sendmail problem


"Jean-Pierre Radley" <jpr@jpr.com> wrote in message
news:20031014195619.GE1531@jpradley.jpr.com...
> Bill Vermillion typed (on Tue, Oct 14, 2003 at 07:25:01PM +0000):
> | In article <3f89de9f@dnews.tpgi.com.au>,
> | Stuart J. Browne <stuart@promed.com.au> wrote:
> | >(SCO:Unix::5.0.4Eb rs.Unix504.0.1.a oss601a.Unix504

SCO:SendMail::8.8.5c)
> |
> | >Two things I'd like to know:
> |
> | > Is there any way to find out what is under that directory

structure
> | >without relying upon the 'ls' and 'find' tools. Find shows the first

2
> | >entries, but then stops.
>
> I suspect you're describing what happens when you run a command like
>
> find /some/path | xargs grep "some string"
>
> and it turns out that there is a pipe-file under /some/path. It is
> the 'grep' that is apparently stopping -- but it isn't stopping, it'll
> eternally read such a FIFO looking for "some string", until you lose
> patience and kill the command.
>
> If that's what's happening, then run:
>
> find /some/path ! -type p | xargs grep "some string"
>
> or better, to avoid the error you get when you try to grep a directory:
>
> find /some/path -type f | xargs grep "some string"


Actually, all I did was:

cd /usr/spool/mqueue
find . -print

It showed two entries, then halted.


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 02-15-2008, 11:51 AM
Stuart J. Browne
 
Posts: n/a
Default Re: Odd sendmail problem

> Stuart J. Browne <stuart@promed.com.au> wrote:
> >(SCO:Unix::5.0.4Eb rs.Unix504.0.1.a oss601a.Unix504

SCO:SendMail::8.8.5c)
> >
> >Howdy.

>
> >An old system which has been working fine (up until recently, no
> >hardware changes, and only unrelated software changes (non-os,
> >just custom app)) has had it's sendmail daemon starting to act
> >funny.

>
> Are you 100% sure that the non-os custom ap isn't contributing to
> the problem. Some applications have been known to send mail - you
> know like LP does when a job is canceled. Sending mail to
> an unknown user on the system can cause a lot of 'undelivered
> messages'.


Our application does launch "/usr/lib/sendmail -t", but I can vouch for the
data that goes to it. At worst, it should be forwarded straight through
the queue (DS entry in /usr/lib/sendmail.cf), to a real mail server.

> >The only tell-tail signs are that the load
> >average sky rockets, /usr/spool/mqueue/
> >(/var/opt/K/SCO/SendMail/8.8.5c/spool/mqueue/) becomes
> >inaccessable to 'ls' etc., and thus mail flow stops.

>
> >A reboot of the machine returns it to normal usage again for
> >some time, but it does it again 2-3 days later.

>
> Next time don't reboot the machine but kill sendmail. The go
> look at the messages in the queue. What does the ouptut of
> 'mailq' tell you. Look at the headers and see if you can spot the
> problem message.


This is what I've attempted each time it's happened. The sendmail
processes don't die. Mailq doesn't complete. I thought maybe it was
lots-of-files, but after leaving it for over half an hour and still no
response, no. I've seen lots-of-files before. Besides, using an unsorted
'find' would get past that, which it did not.

>
> >I thought that maybe some garbage message was getting into the
> >queue, but have no awy of verifying this.

>
> If it cures everytime you restart and don't find the problem before
> you restart you are going to be having this until you determine
> what is causing it.
>
> >Two things I'd like to know:

>
> > Is there any way to find out what is under that directory structure
> >without relying upon the 'ls' and 'find' tools. Find shows the first 2
> >entries, but then stops.

>
> So what does 'ls' give you. If it's too slow you probably have
> thousands in the queue and sorting them is going to take time so
> just type echo *. Output will be a mass all on one line, but
> pipe it through wc and you'll get an idea.


Forgot about 'echo *', will try that next time.

> > Beyond upgrading everything (not an option at this stage),
> >any other ideas about how to combat this?

>
> Why do you think upgrading witll cure this if you haven't found
> the problem? I'm not being facetious. You say the only change
> was a non-os application. If everthing was find until then it
> surely points a finger at that application. Of course if many
> people have root access anything could be wrong. And an upgrade
> instead of a complete reinstall could carry the problem forward.


I'd agree with you if we hadn't updated that application on over 100 other
machines, some of which are identical. I'm more inclined to beleive that
it has to do with external changes on the LAN (which I know is a pipe
dream).

The only thing the custom app does with regards to sendmail is issue
'/usr/lib/sendmail -t < tmp.file'. The code which does that hasn't changed
within the application. At no other time does it go anywhere near sendmail
or the mail subsystems.

> >The system is in a protected network, so it is highly doubtful
> >that someone is attacking this from the outside (it is not
> >in any name servers, and has to go through a firewall to get
> >there).

>
> I also doubt that is a problem.
>
> >Firewall logs don't show anything out of normal.

>
> You are looking at the wrong logs. Look at the sendmail logs.
> Your clue should be there.


Unfortunately not. They are clean. Shows normal operation, and then just
no operation.

> >For the moment, I've tried ruling out an issue with the symlink,
> >and just made a bare directory, hopefully leaving the old
> >contents intact (after a reboot).

>
> "hopefully?"


There are no qf* files in there, just an assortment of df* and xf*, so no
header details. Nothing is obscenely large, just a mix and match of normal
mail traffic, about 30 partial messages in all.

'hopefully' meaning that the nasty-killing of processes due to reboot
leaves the partial files intact, and not cleaned off by a fsck during boot
(I don't have console access, and walking the users out there through
single-user boot-up, bypassing fsck's isn't really on the cards).

> And what 'bare directory' did you make?


Moved symlink '/usr/spool/mqueue' aside, created '/usr/spool/mqueue' as a
directory.

> Unix systems DO NOT need to be rebooted in my experience unless
> it's something that has to do with a kernel related problem. That's
> based on 20 years with *n*x systems.


I'm of a similar mind (much to my bosses frustration), but when the load
average keeps creeping, and processes don't die, choices are very limited,
especially on production machines where people are trying to work.

I'll take 10 min abuse for dropping everybody off, rather than 8hrs of
"It's slow! it's slow!" in this sort of situation.

> I've used sendmail as the MTA for two ISPs and it just doesn't
> problems as you have described. I really bet it is NOT sendmail
> acting funny but something else causing it to accumulate tonnes of
> undeliverable messages.


I use it for 1 ISP, as well as all inhouse, please all of our clients (in
excess of 100+). I trust sendmail. I don't have that many running this
particular version on OSR504 however, so I was wondering if there was a
specific issue that I was unaware of.

> Bill
> --
> Bill Vermillion - bv @ wjv . com


Thanks Bill Your insight is much appreciated.

bkx


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 02-15-2008, 11:51 AM
Bill Vermillion
 
Posts: n/a
Default Re: Odd sendmail problem

In article <3f8c880b$1@dnews.tpgi.com.au>,
Stuart J. Browne <stuart@promed.com.au> wrote:
>> Stuart J. Browne <stuart@promed.com.au> wrote:
>> >(SCO:Unix::5.0.4Eb rs.Unix504.0.1.a oss601a.Unix504

>SCO:SendMail::8.8.5c)


>> >Howdy.


>> >An old system which has been working fine (up until recently, no
>> >hardware changes, and only unrelated software changes (non-os,
>> >just custom app)) has had it's sendmail daemon starting to act
>> >funny.


>> Are you 100% sure that the non-os custom ap isn't contributing to
>> the problem. Some applications have been known to send mail - you
>> know like LP does when a job is canceled. Sending mail to
>> an unknown user on the system can cause a lot of 'undelivered
>> messages'.


>Our application does launch "/usr/lib/sendmail -t", but
>I can vouch for the data that goes to it. At worst, it
>should be forwarded straight through the queue (DS entry in
>/usr/lib/sendmail.cf), to a real mail server.


Is the DS defined to be another machine? If so what is the
possiblity that machine is rejecting the mail and going back to a
non-existant user.

I had a situation [BSD web server] where the ownership of the web
server was changed to www - so a break wouldn't give root access.
When the account was created it was with a directory of
'nonexistant' and shell of '/bin/nologin'. One day I got
'file system full' message and found that 'www' had many MBs of
messages and with no shell or any other access it went unoticed
until things got full.

Another time [a long time ago] I screwed up on something in
sendmail - back in the 4.? days when it wasn't as robust and when I
started it up one process was generating error messages and filling
up the space at a rapid pace. [Irix on an SGI].

I tried killing a process I'd see in the ps but I would get no such
processes. sendmail had run amuck and was spawning processes and
finishing them faster than I could find them.

I issued 'killall /usr/sbin/sendmail' and it took about 20 minutes
for the system to become stable as it was killing processes only
slightly faster than sendmail was generating. This was in a
headless server environment where you don't really want to have to
reboot unless absolutely neccessary.

>> >The only tell-tail signs are that the load
>> >average sky rockets, /usr/spool/mqueue/
>> >(/var/opt/K/SCO/SendMail/8.8.5c/spool/mqueue/) becomes
>> >inaccessable to 'ls' etc., and thus mail flow stops.


You can bet the load average went through the roof in the above
scenario also.

>> >A reboot of the machine returns it to normal usage again for
>> >some time, but it does it again 2-3 days later.


>> Next time don't reboot the machine but kill sendmail. The go
>> look at the messages in the queue. What does the ouptut of
>> 'mailq' tell you. Look at the headers and see if you can spot the
>> problem message.


>This is what I've attempted each time it's happened. The
>sendmail processes don't die. Mailq doesn't complete. I thought
>maybe it was lots-of-files, but after leaving it for over half
>an hour and still no response, no. I've seen lots-of-files
>before. Besides, using an unsorted 'find' would get past that,
>which it did not.


Are you trying to kill sendmail by the process ID. As noted above
I found that was fruitless.

Since all the sendmail files end in digits often when I look at
a listing the queue and want to find out what is causing what
and not running mailq I'll just list the directory and note the
last digits of the last file. Then I can 'less' *NN and get
the q and d files. You might try this after using echo *
and then perhaps try echo *NN until you see a number or two that
you can try cat *NN [if nothing else works] to at least get a clue
on what has gone astray.

>> >I thought that maybe some garbage message was getting into the
>> >queue, but have no awy of verifying this.


>> If it cures everytime you restart and don't find the problem before
>> you restart you are going to be having this until you determine
>> what is causing it.


>> >Two things I'd like to know:


>> > Is there any way to find out what is under that directory
>> >structure without relying upon the 'ls' and 'find' tools.
>> >Find shows the first 2 entries, but then stops.


>> So what does 'ls' give you. If it's too slow you probably have
>> thousands in the queue and sorting them is going to take time so
>> just type echo *. Output will be a mass all on one line, but
>> pipe it through wc and you'll get an idea.


>Forgot about 'echo *', will try that next time.


It's easy to forget and I cuss myself when I need it and take a
while to remember it. I used to have it when trying to fix old
Xenix systems where no one had a clue and I'd boot with one of my
disks and use echo * to find out what was there.


>> > Beyond upgrading everything (not an option at this stage),
>> >any other ideas about how to combat this?


>> Why do you think upgrading witll cure this if you haven't found
>> the problem? I'm not being facetious. You say the only change
>> was a non-os application. If everthing was find until then it
>> surely points a finger at that application. Of course if many
>> people have root access anything could be wrong. And an upgrade
>> instead of a complete reinstall could carry the problem forward.


>I'd agree with you if we hadn't updated that application on
>over 100 other machines, some of which are identical. I'm more
>inclined to beleive that it has to do with external changes on
>the LAN (which I know is a pipe dream).


It could be related to that. Particulary if the DS variable in
your sendmail.cf is not empty and is pointing somewhere else.

As to other machines which are identical I think one of Murphy's
laws of computeing is 'Identical systems never are'.

>The only thing the custom app does with regards to sendmail is
>issue '/usr/lib/sendmail -t < tmp.file'. The code which does
>that hasn't changed within the application. At no other time
>does it go anywhere near sendmail or the mail subsystems.


[When I refrenced /usr/bin/sendmail above I had forgotten that
it was under /usr/lib on SCO].

On your tmp.file you read in with the -t I have no idea what is in
it but try it with ONLY root at the localhost and see if it goes
where it should, then try adding the others names one at a time.
You might also wan't to temporarily make the DS a blank if it is
not at this moment, and then restart sendmail. The .cf and
the sendmail.cw [or local-host-names depending on versions], and
the relay file are the only ones I've found that need a restart
to be recognized. I've not seen any documentaion [it is too large]
so that is just observation.

>> >The system is in a protected network, so it is highly doubtful
>> >that someone is attacking this from the outside (it is not
>> >in any name servers, and has to go through a firewall to get
>> >there).


>> I also doubt that is a problem.


>> >Firewall logs don't show anything out of normal.


>> You are looking at the wrong logs. Look at the sendmail logs.
>> Your clue should be there.


>Unfortunately not. They are clean. Shows normal operation, and
>then just no operation.


Do you know when the app is supposed to be sending mail. If so can
you trigger while performing a tail -f on the logfile. [Thea
presuposes your system is not so busy that they scroll of the
screen faster than you can see them like one of my servers].

>> >For the moment, I've tried ruling out an issue with the symlink,
>> >and just made a bare directory, hopefully leaving the old
>> >contents intact (after a reboot).


>> "hopefully?"


>There are no qf* files in there, just an assortment of df* and
>xf*, so no header details. Nothing is obscenely large, just a
>mix and match of normal mail traffic, about 30 partial messages
>in all.


The only time I've seen just a df* and nothing else appeared to be
from some spammer somewhere as the files are huge and mostly
binary. Could something like swen be on a local PC and deluging
you with something like that? That's just a wild thought as I type
this.

>'hopefully' meaning that the nasty-killing of processes due to
>reboot leaves the partial files intact, and not cleaned off by a
>fsck during boot (I don't have console access, and walking the
>users out there through single-user boot-up, bypassing fsck's
>isn't really on the cards).


fsck should not touch a thing if you did a reboot and not a
poweroff. If your system is swamped it may take quite awhile for
the shutdown to complete. If it never does and you have to take
drastic measures you might try sync;sync;sync;reboot all on
one line. If that works it might get most things flushed so an
fsck won't occur - but dont bet on it.

>> And what 'bare directory' did you make?


>Moved symlink '/usr/spool/mqueue' aside, created
>'/usr/spool/mqueue' as a directory.


Understand.

>> Unix systems DO NOT need to be rebooted in my experience unless
>> it's something that has to do with a kernel related problem. That's
>> based on 20 years with *n*x systems.


>I'm of a similar mind (much to my bosses frustration), but
>when the load average keeps creeping, and processes don't die,
>choices are very limited, especially on production machines
>where people are trying to work.


Do you have a feel for how long it is before it starts getting
sluggish? If so go for about 1/2 that time and just stop sendmail,
and take a look to see if things are building up. Then fire it
back up and see if it goes down at the normal time, or if it now
takes from the restart time to the normal slowness time to get
slow. [I hope I explained that properly].

If that is the case - the time for slowness is based upon sendmail
start time - you might make a cron entry to restart sendmail
nightly. If sendmail takes a while to shut down you might want to
leave it off for 10 or more minutes.

If you have a secondary MX machine that should be no problem as
any mail destinated for that machine will be held there.
[I'm a firm believer in secondary MX but I'm constantly amazed at
how people have them].

>I'll take 10 min abuse for dropping everybody off, rather than 8hrs of
>"It's slow! it's slow!" in this sort of situation.


That's just good business sense.

>> I've used sendmail as the MTA for two ISPs and it just doesn't
>> problems as you have described. I really bet it is NOT sendmail
>> acting funny but something else causing it to accumulate tonnes of
>> undeliverable messages.


>I use it for 1 ISP, as well as all inhouse, please all of our
>clients (in excess of 100+). I trust sendmail. I don't have that
>many running this particular version on OSR504 however, so I was
>wondering if there was a specific issue that I was unaware of.


I run a small niche market ISP - but in the past 10 days I've seen
my daily maillog grow 25% larger as the spam just keeps increasing.

There were 245,808 lines in yesterdays maillog and that's only
a user population of about 150. [One web site comes up #1 in Google
and it has never been adversised - only in the last two months has
any development work been done on it in two years. I just checked
and I tossed away over 49,000 messages destined for that one
yesterday. So that's 20 percent of the total. For that I don't
send rejects or anything. I just route then to /dev/null. Not
polite but the only effective way to handle those.

Being on a tier 1 backbone makes life interesting - and sometimes
hectic.

I hope some of these ideas may help.

Bill


--
Bill Vermillion - bv @ wjv . com
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 11:43 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com