ESP 6 Performance (close_wait) hund threads

Name	Points
Greg Moeller	4184
David Williams	3349
JonA	3288
Kat V	2984
Woozy	1973
Jimmy Chiu	1883
Kwane McNeal	1437
Ragu Raghavan	1351
Roger French	1311
mark.cook	1244

mark.cook

Veteran Member

Posts: 444

10/12/2010 11:55 AM

I have heard of a couple of Lawson clients having issues with performance on ESP 6 with a close_wait issue with hung threads in websphere. Is this being seeing by others?

EricS

Veteran Member

Posts: 80

10/12/2010 1:15 PM

We're actually on LSF 9.0.0.4 and have experienced the same issue. After a while, with GSC help, we traced it to Addins. It turned out that the problem was Addins causing the JVM in WAS to corrupt. There is a parameter delivered by default in LSF 9.0.1.4 that needed to be added to my $LAWDIR/system/iosconfig.xml (see below). Combined with that users need to update any MS Addins to the version posted in the middle of August as there is a bug in it that will ignore the max parameter on a very large table. Note that you can not rely on a user checking the Addins version themselves, there's also an Addin bug that reports the incorrect version number. According to GSC, the two updates combined should allow Addins to page back records from a large query and not crash the JVM.

Parameter:

Not sure if that's your problem, but would be intereted to know if you end up tracing that to an Addin Query.

Jeff White

Veteran Member

Posts: 83

10/12/2010 9:40 PM

The parameter didn't show up in the post. We're on 9014, and our Unix admin has asked me about some high memory java processes on occasion. It doesn't happen all of the time, but it would be nice if we could narrow this down to Addins. We're moving to 9016 at the end of October, and I hope we don't have an issue with this.

Jeff

John Cunningham

Advanced Member

Posts: 31

10/13/2010 3:38 PM

EricS,
I would be interested to hear how GSC was able to determine if it was addins, as well as see that parameter (it did not show in the post). We are on 9.0.1.5, WAS 7 and AIX 6.1 and are having weekly problems with our App Servers. They do not come back until a reboot.

Jimmy Chiu

Veteran Member

Posts: 641

10/13/2010 3:53 PM

Maybe it's this JT

JT-180105 looping stack trace in WebSphere 7, apply the JT and add these two lines in your websphere appserver JVM classpath

WAS_HOME\Plugins\javax.j2ee.servlet.jar
WAS_HOME\Plugins\javax.j2ee.jsp.jar

redeploy your lawsec,bpm,ios etc.

EricS

Veteran Member

Posts: 80

10/13/2010 4:31 PM

Parameter is:
parameter name="com.lawson.ios.dig.db.maxRecsQuery" value="10000"

EricS

Veteran Member

Posts: 80

10/13/2010 4:56 PM

As far as figuring out that Addins was the problem. What would happen is the application server would die, as in the original post threads would end up in a close wait state, and when you tried to bring the app server down, the stopServer.sh command would hang. The only error I could find in any of the logs was in the dmgr log that would have DCSV1115W. The app server logs would have nothing out of the ordinary. GSC indicated that this behavior indicated that the problem was someone running a query on a large table (or trying to join multiple large tables) with Addins. Its actually something I've been chasing for a while, I've caught users doing some really, really, dumb things with Addins through Oracle monitoring. Like trying to download all 17 million rows in MMDIST ordered by several fields. It all fit together.

mark.cook

Veteran Member

Posts: 444

10/14/2010 11:19 AM

Jimmy you are on track with your assessment. You may have also seen the critical announcement from Lawson yesterday that impacts this issue.

Two KB articles were released that speak to changes in the lawsec.jar file. These are related to ESP 5 or 6 and WAS 7. In the KB articles it also references JT180105. It was recommended that we install that JT and follow the steps in the KB articles that included adding classpaths. We have just completed that piece and will be running some testing today.

Another issue that they are working with IBM around performance issues with ESP 5 or 6 and AIX version 6.1 and WAS 7. This combination is causing issues that with no root cause determined but it is a hot issue with Lawson.

MattD

Veteran Member

Posts: 94

10/14/2010 1:37 PM

For those of you experiencing the problem I would be interested to know if you recycle your Lawson application including WAS daily. We are on 9.0.1.5, WAS 7.0, and AIX 6.1 and have not experienced the problem but we do recycle all at midnight each day.

I am going to more actively monitor the threads and see if I notice a problem but so far I haven't seen an issue.

Cheers.
Matt

John Cunningham

Advanced Member

Posts: 31

10/14/2010 2:02 PM

We do not recycle lawson daily, we do it weekly, but the issues has occured within 24 hours of a recycle. Lawson told me that all clients affected by this have MSCM as well. Talking with our support guy, he is saying that it is an AIX 6.1 problem and IBM could have a fix out for testing today.

MattD

Veteran Member

Posts: 94

10/14/2010 2:21 PM

Also, how many JVMs do you have? We have three.

John Cunningham

Advanced Member

Posts: 31

10/14/2010 2:31 PM

We started with 2 but now have 4 so that as JVM's hung during the week we could fail over to the ones that were not hung without having to reboot. This has kept us from rebooting multiple times during the week. We are able to recover applications (like MSCM) by just restarting the app.

Jimmy Chiu

Veteran Member

Posts: 641

10/14/2010 3:41 PM

Lawson has issued a critrical notification for 9.0.1.5/9.0.1.6 running on AIX 6.1 with WAS 7.0. Here is the cut and past from the notification:

CN-LSFAIX-10142010
Release Date: October 14, 2010
Status: Required
Title: Lawson System Foundation – 9.0.1.5 and 9.0.1.6 running on AIX 6.1 and WAS 7.0
Description:
Lawson and IBM have been made aware of a system stability issue with the release configuration noted above. This stability issue manifests as large amounts of WebSphere threads begin to hang. Eventually, the application server node (a JVM) will hang which will have an end result of users being unable to log onto the system when trying to connect to the affected application server node. If a stop is issued to the application server node, it will not stop completely. This scenario may require a reboot of the AIX server.
Lawson and IBM have dedicated resources working to find a resolution to this issue. Lawson will provide an update to this critical notification once a workaround or resolution is identified. At this time we have not identified a timeline for the next update.
Impact
If you are running the configuration above and are experiencing large numbers of hung threads (100+) and your application server node(s) are locking up, please report your issue to Lawson Global Support. This applies to only those running on AIX 6.1 and WAS 7.0 on LSF 9.0.1.5 or 9.0.1.6.

I guess you will have to wait for a fix if this apply to you.

mark.cook

Veteran Member

Posts: 444

10/14/2010 4:46 PM

Thanks Chip, It is good to hear IBM may have a fix out today. I will stay in touch with you as this progresses.