Mac OS X, Sed, and strange document encoding

- 24 Sep 2013
  
  Mac OS X, Sed, and strange document encoding
  
  Written by Tim Bielawa
  The Problem
  
  You’re on Mac OS X (somewhere around 10.7.5) and you’re using the sed command to replace characters from the latin1 or Windows-1252 character encoding with their utf8 equivalents. Unfortunately you get an error like the following:
  sed: 1: "s/#/’/g ": RE error: illegal byte sequence
  Luckily you’re not alone!
  This happened to me while working on HamDecks, a small project that creates Mnemosyne decks to help you study for the Amateur Radio Operator exams using questions from the official ARRL Question pools. The source question pool files (Technician, General, Extra) though have some problems… There’s a lot of characters with strange/exotic encoding in the ARRL pool files that could not be imported into Mnemosyne. That’s how I got myself into this whole mess in the first place.
  
  Options
  
  The stackoverflow link above makes two suggestions:
  1. Use the iconv utility
  2. Use a PERL one-liner
  Your Mileage May Vary, but neither of those suggestions worked for me. So what did work then?
  
  Potential Solution
  
  Once again, we will visit our system locale settings.
  
  Here’s what worked for the HamDecks project:
  
  Instead of just prefixing the sed command with LANG=C, we prefix it with LANG=C LANG_ALL=C. I’m not saying this is a silver bullet, just that it worked for me and might work for you too.
  0 Comments
- The Authors
- Virtual Disk Guide
  
  Interested in virtualization? Do QCOWs rule your filesystem? Are you a libvirt or KVM+QEMU wizard? I wrote a book about virtual disk management. Check out the The Linux Sysadmin's Guide to Virtual Disks online for free at ScribesGuides.com.
  
  Consider supporting the author by purchasing a hard copy of the first edition for just $10.00 on Lulu.com.
- bitmath
  
  bitmath is a Python library for dealing with file size units (GiB's, kB's, etc) in a sane way. bitmath supports arithmetic, rich comparison, conversion, automatic best human-readable representation, and many other utility functions. Read some examples on the docs site or check out the source on GitHub.
- latest posts
  - Querying block device sizes in Python on Linux and Mac OS X February 4, 2023
  - Using jq to filter an array of objects from JSON September 9, 2019
  - Two Year Break — And we’re back! November 16, 2018
- tags
  bitmath blog conference css dblatex DNS DocBook eclipse Emacs Erlang Fedora fedora 22 filter GNU Screen Haiku Introduction java jboss LCSEE Linux locale locales fix slicehost ubuntu Macports module nist nXML-Mode opengl open source OS X package packaging pki prefix units presentation project pypi Python scholarship si summit Tutorial ubuntu xcode XML XMPP
- h4ck teh world
  tbielawa

Technitribe

24 Sep 2013

Mac OS X, Sed, and strange document encoding

Written by Tim Bielawa

The Problem

Options

Potential Solution

Virtual Disk Guide

bitmath

latest posts

tags

h4ck teh world