High volume transactions causing LSF9 problems

Name	Points
Greg Moeller	4184
David Williams	3349
JonA	3288
Kat V	2984
Woozy	1973
Jimmy Chiu	1883
Kwane McNeal	1437
Ragu Raghavan	1348
Roger French	1311
mark.cook	1244

Dr House

New Member

Posts: 3

9/14/2009 7:19 PM

We are currently running LSF9, MSP5 w/patches on AIX, in a 4 server environment (web/ldap/oracle/app). We are able to replicate a problem with the Lawson environment not being able to process transactions appropriately. After pushing through lots of tranactions (potentially 50k+) in a short time frame (24 hours), our system begins to behave in a extraordinary fashion - but not in a good way.

Basically we have a script that feed lajs batch jobs (jqsubmit -w), but the batch job submission will hang until a command like `jqstatus -a` (or any flag you wish including -h help!) will the transaction return to the command line. This is an indefinite hang - so until another command run against the job scheduler, the jqsubmit will hang, this can mean 5 mins or as long as 2-3 days. Note that we are using the -w parameter when submitting the job.

Has anyone ever heard of this before?

Dr House

New Member

Posts: 3

9/14/2009 7:22 PM

Let me modify my previous statement that we are using wtsubmit (as jqsubmit -w doesn't exist). We also use `jobload -c filename` - both have the same behavior.

Jimmy Chiu

Veteran Member

Posts: 641

9/15/2009 3:26 PM

I would increase the heap size under laconfig to see if that helps.

You may have reached the limit of your server hardware also. What's the spec on your app server?

Dr House

New Member

Posts: 3

9/15/2009 3:43 PM

I will try that out, we've tried many different things to avoid this problem, adding memory, lawson patches (app and env), oracle patches, websphere patches, tivoli patches, but none have proven to reduce the frequency or avoid the problem.

We have our partitions running on a p595 the app server has 16 processors with 65GB of RAM, the DB server has 8 processors & 32 GB of RAM. We currently run websphere, but don't use the portal - our users still come in from the LID.

Thanks for the help, I will try upping the heap size and try to break the server again. I'll report back what I find (positive or negative).

Jimmy Chiu

Veteran Member

Posts: 641

9/15/2009 3:52 PM

You also mentioned that you have a script to feed lajs batch job. Maybe slow down the script and put some delay in between each submit?

Also i would assume with the amount of transaction you are processing, your app server and db server are on multiple fiber connections? Reducing the network latency or delay may help the situation also.

Norm

Veteran Member

Posts: 40

9/15/2009 4:41 PM

Sounds like you're real problem is with LAJS since it appears you're using wtsubmit to submit batch jobs. We do something similar, but on a much smaller scale. Ours probably only submits a couple thousand jobs in a 24 hour period.

We're not experiencing that perticular proble, but we'll have a problem where some of our jobs simply abort after zero or one second elapsed time. It's almost os if lajs loses track of the process and then kills it. Very weird.

I'm not a big fan of LAJS, I don't think it was built to handle volumes of 'automatically' submitted jobs. But that doesn't really help you does it? You say that running something like a jqstatus command frees up the block. Can you use cron (or a similar process) to submit a jqstatus command every minute or so to see if that helps LAJS stay unfrozen?

John Henley

Senior Member

Posts: 3348

10/3/2009 10:07 PM

Since this is all running in batch/LID,etc. websphere shouldn't really play a part...it really sounds like you are experiencing a resource/memory leak.

Have you looked at basic environment settings, such as ladb.cfg?
It could be something really simply (trust me, I've seen crazier stuff) like someone sent the INSERTBUFSIZE really high in the ORACLE file, and forgot to set it back down.