GLTRANS - (Incremental update to DW) Identifying newly inserted or updated records

Sort:
You are not authorized to post a reply.
Author
Messages
debika_sharma
Basic Member
Posts: 13
Basic Member
    I am a data warehouse/BI architect developing a data warehouse that must extract records from GLTRANS once nightly. I don't want to do a full refresh of the GLTRANS table (that would mean pulling over several hundred million rows). Instead I would like to find a way to identify the new records and the records that have been updated since my last extract.

    So for example, lets say I have in my data warehouse all records from GLTRANS through 2/22/07. When my ETL job runs on the next day, I want it to extract only those records that are new or have changed since 2/22/07.

    Any suggestions would be extremely helpful.
    surajbang
    Basic Member
    Posts: 4
    Basic Member

      You can use the "Update_Date" field which holds the date when the the record was last inserted or updated in GLTRANS.

      Note: Please check whether all the programs in your system updates the "update_date" field in GLTRANS when the record is modified.

      When ever ETL job runs it needs to store the current date(Last_Run_date) into a table.
      Next time ETL job runs it should pick up the last inserted date i.e highest(Last_Run_date) from that table.

      A filter can be added to the ETL
      Select * from GLTRANS where Update_Date > Last_Run_Date.

      Hope this helps...

      John Henley
      Senior Member
      Posts: 3348
      Senior Member
        Debika,
        What product are you using to do the ETL? I've had some success with Informatica doing incremental updates for some AR tables based on existence of primary keys for INSERTs and field changes for UPDATEs. That was necessary for the AR tables because they--like must of the Lawson tables--don't have update dates/times (or don't consistent them consistently.

        I looked at GL190 and some of the GL programs, and it looks like update_date--as Saraj suggested--should work OK for GLTRANS.
        Thanks for using the LawsonGuru.com forums!
        John
        debika_sharma
        Basic Member
        Posts: 13
        Basic Member
          Thanks for the feedback so far, however I am still in need of a solution due to conflicting scenarios that exist....

          I thought I could use the update_date field, but I have a scenario that invalidates the logic. In our system, a record can be inserted into GLTRANS on 2/28/07 and not yet posted (so not in r-status 8 or 9). In this case the posted_date column is blank but the update_date column stores 2/28/07. Then, lets say this record actually posts on 8/31/07, at which point the posted date column would say 8/31/07 and the update_date column would remain unchanged and still say 2/28/07. In this scenario, I would not be able to identify this record.

          Any feedback would be appreciated.

          (To answer the other question - we are using Informatica for our ETL.)
          John Henley
          Senior Member
          Posts: 3348
          Senior Member
            Debika,
            1. In your scenario, GL190 will change UPDATE_DATE from 2/28/07 to whatever the system date is when GL190 runs and the status changes to 9.
            2. Have you looked at using Informatica's incremental updates? It does field-by-field comparisons and it's pretty fast, although with hundreds of millions of rows that may not be the case. I'm judging based on millions not hundreds of millions. However you could do some things to speed that up, like only looking current and future fiscal year, etc.
            Thanks for using the LawsonGuru.com forums!
            John
            surajbang
            Basic Member
            Posts: 4
            Basic Member
              I agree to what John suggested. GL190 should take care of your problem.
              If you are still facing that problem then you can modify your ETL transformation to check whether Update_Date field or Posting_Date field is greater than your Last_Run_Date. The logical "OR" will always return you the row if it was posted or modified/inserted. Well I had came across similar situation and it worked but we were using OWB (Oracle Warehouse Builder) - ETL tool.
              You are not authorized to post a reply.