Feed on
Posts
Comments

argh, i wasted about an hour the other day trying to parse an excel file in perl. i thought i could just save the file as a “tab-delimited” file and load the .txt file in perl. things didn’t work out however, as perl (and the unix command line) thought the .txt file was just one line long.

staring at the file using the “cat” or “more” command, i saw a bunch of ^M (carat M’s) wherever a newline should’ve been. it took forever for me to figure out those ^M’s were actually carriage returns.

once i realized what was causing the problem, the solution came pretty quickly. just replace each carriage return with a newline character. here’s a little script to do just that:

#!/bin/sh
# script to remove carriage returns from files (useful for processing
# excel tab-delimited files). use:
# removeCarriageReturn.sh FILENAME

tr “\r” “\n” $1.ncr

(the file with the newlines now has .ncr appended to the end of the filename.) oh, and if you’re trying to do this replacement with sed on a mac, forget about it. it seems that the implementation of sed for “os x” doesn’t recognize the carriage return character (“\r”). bah, wasted a good 10 minutes trying to figure that one out.


Bookmark and Share

if that was helpful ...

check out the other tips and tricks i've compiled on these pages. you might learn something else interesting!

7 Responses to “read an excel file in unix or perl”

  1. on 30 Oct 2006 at 6:57 pm SC

    cat tp.sh | tr “\r” “\n” > tp.sh.ncr

    This worked on tp.sh, which was returning “bad interpreter” due to carriage return problems (copied from Firefox, pasted to and saved in TextEdit).

    original bash script from
    http://www.cutup.org/anize/?file=tp

  2. on 02 May 2007 at 12:52 pm Andrew Louie

    This is a topic that has been beaten to death on the cygwin mailing list ever since cygwin became more “POSIX” strict.

    Windows uses DOS style line endings that typically end in \r\n (carriage return, line feed) a throw back to the old lineprinter days, when you had to explicity tell the printer head to return to position zero. (cr) and scroll the paper (lf). the POSIX spec says lines should only end in \n (LF)

    So any file saved in windows automatically gets a \r\n line ending on all lines. cygwin comes with a small util called “d2u” that strips the \r off of it. You can use that on all scripts that are saved in dos format. the best solution is to not use windows for any scripting or text editing.

    btw I never generate data in excel, All of my data is always generated as a CSV, and manipulations are done in awk =P

  3. on 13 Jul 2007 at 10:35 pm Joe McMahon

    The easy way on OS X is to accept that Excel writes the file using \r, and tell Perl to use that character as end of line:

    $/ = “\r”;

    Now Perl will use that as the newline character, and your files will parse.
    Obviously before you print anything, you’re gonna want to set it back to “\n” again, so probably your best bet is

    {
    local $/ = “\r”;
    # read your tab-delimited file here and manipulate the line
    }
    # $/ auto-resets to the system default (\n) when the ‘local’ goes out of scope.

    (Here’s hoping the formatting doesn’t get mangled…)

  4. on 18 Jun 2008 at 5:13 am Ash

    Hi,

    I have some data in Unix , need to transferred to excel from unix…. can i do using any scripting language??? can u plsss help me on this.

    Thanks,
    Ash

  5. on 18 Jun 2008 at 5:26 am Ash

    Hi,

    I have some data in unix and i need to ransfer those data to excel through some script ….Can u plsss suggest me wch scripting language should be used….
    Also plss suggest me some links …

    Thanks,
    Ash

  6. on 19 Aug 2008 at 9:48 pm Lawrence David

    I know this is way late, but for future reference, excel should be able to import most data using the “Get External Data” feature of Excel.

  7. on 27 Oct 2008 at 7:58 pm Sean

    I got a huge, one-line .csv file generated by Excel 2007. Unix command could not handle it.

    So use your method to transforming “\r” into “\n”, and it works great!

    Thank you very much.

    Best regards,

    Sean

Did I get this wrong? Let me know!

Trackback URI | Comments RSS

More blogs about http://desk.stinkpot.org:8080/tricks.