ESP 6 Performance (close_wait) hund threads

Sort:
You are not authorized to post a reply.
Author
Messages
mark.cook
Veteran Member
Posts: 444
Veteran Member

    I have heard of a couple of Lawson clients having issues with performance on ESP 6 with a close_wait issue with hung threads in websphere. Is this being seeing by others?

    EricS
    Veteran Member
    Posts: 80
    Veteran Member
      We're actually on LSF 9.0.0.4 and have experienced the same issue. After a while, with GSC help, we traced it to Addins. It turned out that the problem was Addins causing the JVM in WAS to corrupt. There is a parameter delivered by default in LSF 9.0.1.4 that needed to be added to my $LAWDIR/system/iosconfig.xml (see below). Combined with that users need to update any MS Addins to the version posted in the middle of August as there is a bug in it that will ignore the max parameter on a very large table. Note that you can not rely on a user checking the Addins version themselves, there's also an Addin bug that reports the incorrect version number. According to GSC, the two updates combined should allow Addins to page back records from a large query and not crash the JVM.

      Parameter:



      Not sure if that's your problem, but would be intereted to know if you end up tracing that to an Addin Query.
      Jeff White
      Veteran Member
      Posts: 83
      Veteran Member
        The parameter didn't show up in the post. We're on 9014, and our Unix admin has asked me about some high memory java processes on occasion. It doesn't happen all of the time, but it would be nice if we could narrow this down to Addins. We're moving to 9016 at the end of October, and I hope we don't have an issue with this.

        Jeff
        John Cunningham
        Advanced Member
        Posts: 31
        Advanced Member
          EricS,
          I would be interested to hear how GSC was able to determine if it was addins, as well as see that parameter (it did not show in the post).  We are on 9.0.1.5, WAS 7 and AIX 6.1 and are having weekly problems with our App Servers.  They do not come back until a reboot.
          Jimmy Chiu
          Veteran Member
          Posts: 641
          Veteran Member
            Maybe it's this JT

            JT-180105 looping stack trace in WebSphere 7, apply the JT and add these two lines in your websphere appserver JVM classpath

            WAS_HOME\Plugins\javax.j2ee.servlet.jar
            WAS_HOME\Plugins\javax.j2ee.jsp.jar

            redeploy your lawsec,bpm,ios etc.
            EricS
            Veteran Member
            Posts: 80
            Veteran Member
              Parameter is:
              parameter name="com.lawson.ios.dig.db.maxRecsQuery" value="10000"
              EricS
              Veteran Member
              Posts: 80
              Veteran Member
                As far as figuring out that Addins was the problem. What would happen is the application server would die, as in the original post threads would end up in a close wait state, and when you tried to bring the app server down, the stopServer.sh command would hang. The only error I could find in any of the logs was in the dmgr log that would have DCSV1115W. The app server logs would have nothing out of the ordinary. GSC indicated that this behavior indicated that the problem was someone running a query on a large table (or trying to join multiple large tables) with Addins. Its actually something I've been chasing for a while, I've caught users doing some really, really, dumb things with Addins through Oracle monitoring. Like trying to download all 17 million rows in MMDIST ordered by several fields. It all fit together.
                mark.cook
                Veteran Member
                Posts: 444
                Veteran Member
                  Jimmy you are on track with your assessment. You may have also seen the critical announcement from Lawson yesterday that impacts this issue.

                  Two KB articles were released that speak to changes in the lawsec.jar file. These are related to ESP 5 or 6 and WAS 7. In the KB articles it also references JT180105. It was recommended that we install that JT and follow the steps in the KB articles that included adding classpaths. We have just completed that piece and will be running some testing today.

                  Another issue that they are working with IBM around performance issues with ESP 5 or 6 and AIX version 6.1 and WAS 7. This combination is causing issues that with no root cause determined but it is a hot issue with Lawson.
                  MattD
                  Veteran Member
                  Posts: 94
                  Veteran Member
                    For those of you experiencing the problem I would be interested to know if you recycle your Lawson application including WAS daily. We are on 9.0.1.5, WAS 7.0, and AIX 6.1 and have not experienced the problem but we do recycle all at midnight each day.

                    I am going to more actively monitor the threads and see if I notice a problem but so far I haven't seen an issue.

                    Cheers.
                    Matt
                    John Cunningham
                    Advanced Member
                    Posts: 31
                    Advanced Member
                      We do not recycle lawson daily, we do it weekly, but the issues has occured within 24 hours of a recycle.  Lawson told me that all clients affected by this have MSCM as well.  Talking with our support guy, he is saying that it is an AIX 6.1 problem and IBM could have a fix out for testing today.
                      MattD
                      Veteran Member
                      Posts: 94
                      Veteran Member
                        Also, how many JVMs do you have? We have three.
                        John Cunningham
                        Advanced Member
                        Posts: 31
                        Advanced Member
                          We started with 2 but now have 4 so that as JVM's hung during the week we could fail over to the ones that were not hung without having to reboot. This has kept us from rebooting multiple times during the week. We are able to recover applications (like MSCM) by just restarting the app.
                          Jimmy Chiu
                          Veteran Member
                          Posts: 641
                          Veteran Member
                            Lawson has issued a critrical notification for 9.0.1.5/9.0.1.6 running on AIX 6.1 with WAS 7.0. Here is the cut and past from the notification:

                            CN-LSFAIX-10142010
                            Release Date: October 14, 2010
                            Status: Required
                            Title: Lawson System Foundation – 9.0.1.5 and 9.0.1.6 running on AIX 6.1 and WAS 7.0
                            Description:
                            Lawson and IBM have been made aware of a system stability issue with the release configuration noted above. This stability issue manifests as large amounts of WebSphere threads begin to hang. Eventually, the application server node (a JVM) will hang which will have an end result of users being unable to log onto the system when trying to connect to the affected application server node. If a stop is issued to the application server node, it will not stop completely. This scenario may require a reboot of the AIX server.
                            Lawson and IBM have dedicated resources working to find a resolution to this issue. Lawson will provide an update to this critical notification once a workaround or resolution is identified. At this time we have not identified a timeline for the next update.
                            Impact
                            If you are running the configuration above and are experiencing large numbers of hung threads (100+) and your application server node(s) are locking up, please report your issue to Lawson Global Support. This applies to only those running on AIX 6.1 and WAS 7.0 on LSF 9.0.1.5 or 9.0.1.6.

                            I guess you will have to wait for a fix if this apply to you.
                            mark.cook
                            Veteran Member
                            Posts: 444
                            Veteran Member
                              Thanks Chip, It is good to hear IBM may have a fix out today. I will stay in touch with you as this progresses.
                              You are not authorized to post a reply.