Technitribe

interesting problems (and a few solutions, too)

Technitribe
  • About the Authors
  • Log In
  • Log Out
  • Lost Password
  • Register
  • Reset Password
    • 24 Sep 2013

      Mac OS X, Sed, and strange document encoding

      Written by Tim Bielawa

      The Problem

      You’re on Mac OS X (somewhere around 10.7.5) and you’re using the sed command to replace characters from the latin1 or Windows-1252 character encoding with their utf8 equivalents. Unfortunately you get an error like the following:

      sed: 1: "s/#/’/g
      ": RE error: illegal byte sequence

      Luckily you’re not alone!

      • vim_dev
      • homebre-deps
      • HamDecks
      • stackoverflow

      This happened to me while working on HamDecks, a small project that creates Mnemosyne decks to help you study for the Amateur Radio Operator exams using questions from the official ARRL Question pools. The source question pool files (Technician, General, Extra) though have some problems… There’s a lot of characters with strange/exotic encoding in the ARRL pool files that could not be imported into Mnemosyne. That’s how I got myself into this whole mess in the first place.

      Options

      The stackoverflow link above makes two suggestions:

      1. Use the iconv utility
      2. Use a PERL one-liner

      Your Mileage May Vary, but neither of those suggestions worked for me. So what did work then?

      Potential Solution

      Once again, we will visit our system locale settings.

      Here’s what worked for the HamDecks project:

      Instead of just prefixing the sed command with LANG=C, we prefix it with LANG=C LANG_ALL=C. I’m not saying this is a silver bullet, just that it worked for me and might work for you too.

      • Tags »
      • LANG latin1 locale Mac OS X

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    • The Authors
    • Virtual Disk Guide

      Interested in virtualization? Do QCOWs rule your filesystem? Are you a libvirt or KVM+QEMU wizard? I wrote a book about virtual disk management. Check out the The Linux Sysadmin's Guide to Virtual Disks online for free at ScribesGuides.com.


      Consider supporting the author by purchasing a hard copy of the first edition for just $10.00 on Lulu.com.

    • bitmath

      bitmath is a Python library for dealing with file size units (GiB's, kB's, etc) in a sane way. bitmath supports arithmetic, rich comparison, conversion, automatic best human-readable representation, and many other utility functions. Read some examples on the docs site or check out the source on GitHub.

    • latest posts

      • Querying block device sizes in Python on Linux and Mac OS X February 4, 2023
      • Using jq to filter an array of objects from JSON September 9, 2019
      • Two Year Break — And we’re back! November 16, 2018
    • tags

      bitmath blog conference css dblatex DNS DocBook eclipse Emacs Erlang Fedora fedora 22 filter GNU Screen Haiku Introduction java jboss LCSEE Linux locale locales fix slicehost ubuntu Macports module nist nXML-Mode opengl open source OS X package packaging pki prefix units presentation project pypi Python scholarship si summit Tutorial ubuntu xcode XML XMPP
    • h4ck teh world

      tbielawatbielawa
      • Issue Comment
        bitmath
        February 6, 2023 - 12:55 am UTC
      • Issue Comment
        bitmath
        February 6, 2023 - 12:54 am UTC
      • Push
        bitmath
        February 6, 2023 - 12:51 am UTC
      • Issue Comment
        bitmath
        February 6, 2023 - 12:36 am UTC
      • Push
        bitmath
        February 6, 2023 - 12:30 am UTC

Creative Commons License
Technitribe by Tim Bielawa is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.