LASE dies - LawsonGuru.com - LawsonGuru.com Forums

Name	Points
Greg Moeller	4184
David Williams	3349
JonA	3288
Kat V	2984
Woozy	1973
Jimmy Chiu	1883
Kwane McNeal	1437
Ragu Raghavan	1348
Roger French	1311
mark.cook	1244

Brian K

Advanced Member

Posts: 20

8/18/2009 3:07 PM

We went up on LSF9.0.0.5 in February of 2009. Since then we have had issues with LASE dying every once and awhile. We have been working with Lawson on resolving this issue. It occures (on average) once every other week. We can restart the lase process and everything is OK, but we are really trying to figure out how to overcome the issue.

We have applied about 4 PTs that have not corrected the issue. Is there anyone else that is or has experienced this issue? Did you resolve it?

Thanks!
Brian

Jimmy Chiu

Veteran Member

Posts: 641

8/18/2009 7:11 PM

Check your lase.log under LAWDIR/system, is it filling up Got exception while reading from connection errors? I have to clear the log every other week because it's filling up too fast.

Brian K

Advanced Member

Posts: 20

8/18/2009 8:51 PM

Actually, we archive the log files everyday, but the following (with the last error repeated ) is all that we're getting in lase.log:

Tue Aug 18 07:00:08 2009: Security Environment terminated with an exit status of: 0
Tue Aug 18 07:00:08 2009: Security Environment Version 9.0.0.5.602 2009-05-27 04:00:00 (200805) Stopped.

Tue Aug 18 07:00:09 2009: Security server 'default' failed to stop, killed.

08/18/2009 07:00:25 getuserenv: Pid 241918
241918: Could not get SecCtx
241918: Error number = 111
241918: Error message =
241918: Error occured: Error (Result=1)
241918: UserName = lawson

Alex Tsekhansky

Veteran Member

Posts: 92

8/19/2009 1:48 AM

Brian - that message is dated 7am. Was that when LASE died, or was that when you brought up (or down) the system before/asfter the backup?

I would be curious to see messages around LASE dying event.

Thanks.

Alex.

Brian K

Advanced Member

Posts: 20

8/19/2009 12:36 PM

Hi Alex,
Those are the messages around lase dying. I think that it died at 7:00:09, and then until we brought it back up, everytime a user tried to access something, we got the thread:

08/18/2009 07:00:25 getuserenv: Pid 241918
241918: Could not get SecCtx
241918: Error number = 111
241918: Error message =
241918: Error occured: Error (Result=1)
241918: UserName = lawson

With different UserNames.

Then at 7:18, restarted lase and we get this in the log file:

Tue Aug 18 07:18:39 2009: Timeout value is adjusted to 30 Secs

Tue Aug 18 07:18:39 2009: Security Environment Version 9.0.0.5.602 2009-05-27 04:00:00 (200805) starting.

Tue Aug 18 07:18:39 2009: Security Environment Version 9.0.0.5.602 2009-05-27 04:00:00 (200805) started.

Security Server: Initializing default...

Brian K

Advanced Member

Posts: 20

8/19/2009 1:24 PM

Also, to clarify, at 7:18/7:17, we restarted lase.

Lonnie

New Member

Posts: 1

8/19/2009 2:48 PM

Our LASE issues have always been related to LDAP server going down. At least that has been our experience.

John Henley

Senior Member

Posts: 3348

8/19/2009 2:53 PM

Brian, are you on Windows or Unix or iSeries? What LDAP are you using-TDS or ADAM?

John Henley

Senior Member

Posts: 3348

8/19/2009 3:12 PM

If you're running on TDS, one issue is that DB2 requires some "care and feeding", and if not properly attended to, causes TDS to fail, in turn bringing LASE down.

Jimmy Chiu

Veteran Member

Posts: 641

8/19/2009 5:36 PM

Is there a error entry in your LADB log file corresponds to the LASE log file at the time of failure?

Brian K

Advanced Member

Posts: 20

8/19/2009 6:22 PM

How do we "care" for TDS? We asked Lawson this too and only were given one thing to do to clean up TDS.

We run Unix, AIX, TDS -DB2, and have an Oracle database.

Brian K

Advanced Member

Posts: 20

8/19/2009 6:24 PM

In ladb, we only get errors after the fact:

Tue Aug 18 07:00:25 2009: [PO30] GetDbAuthInfo() Cannot access PROD with a null RMId

Tue Aug 18 07:01:32 2009: [PO20] GetDbAuthInfo() Cannot access PROD with a null RMId

Which to me, these make sense. LASE is dead and doesn't know what the RMid is, so it calls it null.

John Henley

Senior Member

Posts: 3348

8/19/2009 6:57 PM

the question was whether or not there are errors in the LDAP log, not LADB log.

Brian K

Advanced Member

Posts: 20

8/19/2009 8:17 PM

I couldn't find anything in LDAP logs around the time that LASE died.

I am very inexperienced with TDS, but the log files that I have checked in the past are:
/home/idsldap/sqllib/db2dump
/usr/lsfprod/idsslapd-idsldap/logs

Are there any others that I should be interested in?

Bart Conger

Advanced Member

Posts: 18

8/19/2009 8:34 PM

Validate your sso/ldap resource data is good.

Run:
ssoconfig -c
Option 5 (manage)
Option 6 (export service and identity info)
Option 1 (export ALL)
Enter ALL (export ALL identities)
Save the file.

search for null:
grep -i null | lashow or > to file

See if there is an actual record with null for a value. If so, you may should be able to then delete the record.

Mike Gauthier

New Member

Posts: 2

8/20/2009 2:28 PM

I can't offer Unix specific help, but this may be something to consider.

We run i/Series with TDS, and we recently had an issue where a pending constraint on an LDAP object cause LASE to fail. We had to find and change the status of the pending constraint.

One thing that lead us to the error was the job log for IBM Directory Server (QDIRSRV). The joblog indicated a SQL error occuring when were trying to launch LASE. IBM documents indicates that when a constraint is in "check pending" status - reads are not allowed.

Brian K

Advanced Member

Posts: 20

8/20/2009 5:20 PM

Hi Bart,
I did what you asked just to make sure, but there are no null entries (except in the SSOP entry for HTTPS) for users.

The reason why I am getting NULL in the error file is because lase died. If you log into portal, then stoplase, and then in portal try to hit a form, you will get the same entries in your lase.log. At least that's how it is in our test and prod environments.

Thanks for the suggestion.

Brian

Jwiff

Basic Member

Posts: 12

8/20/2009 10:14 PM

Brian,
We had something very similar at the begining of the month. Is your LDAP on a different server than you application? (ours is)
Bottomline: We had an intemitent problem with the DNS server between the two. Only found it because we noticed a pattern that it would occur on the :46 & :52 after each hour. Ends up that some other app in the system had a job that did a mess of DNS lookups and slowed down our system enough at those times to cause the problem. Found it by running ping from the app server to the LDAP -- it would stop pinging, the SecCtx error would occur, and a RMid type error would show up on either the ladb or latm log, just depends on what folks were doing. Network folks said we were doing a 'reverse lookup' and the primary DNS server had problems (ends up a bad fan), bounced to the secondary, but because it was a 'reverse lookup', bounced between the 2 long enough to time out - causing the 'LDAP' error.

On a different thought (not related to SecCtx errors), when we first installed LSF9006, lase would not stop until we stopped it twice. We are not sure if we found the solution, but (were use LID) when we delete users, we were not removing them from the LDAP (using the Lawson Security Administrator). Found that order counted too. Now: delusers (from LID); remove from LSA; remove user from UNIX (smit). A thread on this is on the LAWSON Community.

Good luck.

Jimmy Chiu

Veteran Member

Posts: 641

8/23/2009 2:32 AM

Brian,

Can you check your ldap server log and and check for yor ldap server network connection error during the time lase went down? I am starting to think your LDAP server had network outage or was dropped from the domain briefly. Something external maybe causing this.

Brian K

Advanced Member

Posts: 20

9/8/2009 4:31 PM

All of the log files and network admins have confirmed that there were no network "hiccups" at that time. We have tried to simulate other potential issues like Reverse DNS Lookup issues and such and we can't make it fail in our test servers.

We have continued to have this issue, but are hoping to upgrade to a new version of the environment to see if this is the issue.

Thanks for all of your input.