0

I'm at a total loss of how to do this.

My Question: I want to take this:

"A, two words with comma","B","C word without comma","D"
"E, two words with comma","F","G more stuff","H no commas here!"
... (continue)

To this:

"A, two words with comma",B,C word without comma,D
"E, two words with comma",F,G more stuff,H no commas here!
... (continue)

I used software that created 1,900 records in a text file and I think it was supposed to be a CSV but whoever wrote the software doesn't know how CSV files work because it only needs quotes if the cell contains a comma (right?). At least I know that in Excel it puts everything in the first cell...

I would prefer this to be solvable using some sort of command line tool like perl or python (I'm on a Mac). I don't want to make a whole project in Java or anything to take care of this.

Any help is greatly appreciated!

4

4 に答える 4

3

Shot in the dark here, but I think that Excel is putting everything in the first column because it doesn't know it's being given comma-separated data.

Excel has a "text-to-columns" feature, where you'll be able to split a column by a delimiter (make sure you choose the comma).

There's more info here:

http://support.microsoft.com/kb/214261

edit

You might also try renaming the file from *.txt to *.csv. That will change the way Excel reads the file, so it better understands how to parse whatever it finds inside.

于 2013-02-13T03:04:39.830 に答える
0

If just bashing is an option, you can try this one-liner in a terminal:

cat file.csv | sed 's/"\([^,]*\)"/\1/g' >> new-file.csv
于 2013-02-13T03:04:46.470 に答える
0

That technically should be fine. It is text delimited with the " and separated via the , I don't see anything wrong with the first at all, any field may be quoted, only some require it. More than likely the writer of the code didn't want to over complicate the logic and quoted everything.

于 2013-02-13T03:04:58.777 に答える
0

One way to clean it up is to feed the data to csv and dump it back.

import csv
from cStringIO import StringIO

bad_data = """\
"A, two words with comma","B","C word without comma","D"
"E, two words with comma","F","G more stuff","H no commas here!"
"""

buffer = StringIO()
writer = csv.writer(buffer)
writer.writerows(csv.reader(bad_data.split('\n')))

buffer.seek(0)
print buffer.read()

Python's csv.writer will default to the "excel" dialect, so it will not write the commas when not necessary.

于 2013-02-13T03:10:54.250 に答える