CSV file with embedded commas

 5 Replies
 0 Subscribed to this topic
 52 Subscribed to this forum
Sort:
Author
Messages
Terry P
Veteran Member Send Private Message
Posts: 234
Veteran Member
I have a CSV file being delivered by an outside agency. It is a CSV file in that commas separate the fields, but they are also putting the ALPHA fields wrapped in double quotes. A typical record might look like this:

12345,"Terry","Pickering","this, that and others"

I'm using standard data iterator activity to parse out each field. I also can remove the "quote" characters. The problem is when I encounter a record where they have embedded the comma in a field surrounded by quotes. The parser picks up the comma in the middle of the field.

Any suggestions on how to get around this?
Bob Canham
Veteran Member Send Private Message
Posts: 217
Veteran Member
I haven't found a solution to that (and would look forward to hearing from anyone who has). What we were able to do was get the outside agency to send it as pipe | delimited file rather than comma delimited. That took care of the problem for us.
John Henley
Send Private Message
Posts: 3351
The best solution I've used is regex (regular expression) in JavaScript. It's very powerful for parsing any type of string data. Not easiest thing to learn/grasp but it works well for what you're trying to do (trust me, I've been down this road
Thanks for using the LawsonGuru.com forums!
John
jamesraceson
Veteran Member Send Private Message
Posts: 52
Veteran Member
Terry,

If you are familar with regular expressions and Javascript, you could write one to look for the pattern [space comma] [comma space] and [space comma space]. You could put the code inside of an assign node after each pass for the record. Because most CSV files do not have a leading or trailing space for each field, if you found a match to your expression this would most likely be a comma inserted in somewhere in the data (in between the quotes). You would then replace the matches with just one space. I would run a check to make sure before committing any changes, but that should help get rid of the unwanted commas.
Terry P
Veteran Member Send Private Message
Posts: 234
Veteran Member
Good suggestions. I like the last one. I could do a replace on the entire record and do a replace ", " with " ". This is then before I parse the individual fields. I'll post the solution if it works.
Terry P
Veteran Member Send Private Message
Posts: 234
Veteran Member
That worked!

I added an Assign between the two data iterator activities to a new variable with this code:

CSVFile_outputData.replace(/, /g," ")

Then I replaced the input for the second one to use this new variable.