Technitribe

interesting problems (and a few solutions, too)

Technitribe
  • About the Authors
  • Log In
  • Log Out
  • Lost Password
  • Register
  • Reset Password
    • 24 Sep 2013

      Mac OS X, Sed, and strange document encoding

      Written by Tim Bielawa

      The Problem

      You’re on Mac OS X (somewhere around 10.7.5) and you’re using the sed command to replace characters from the latin1 or Windows-1252 character encoding with their utf8 equivalents. Unfortunately you get an error like the following:

      sed: 1: "s/#/’/g
      ": RE error: illegal byte sequence

      Luckily you’re not alone!

      • vim_dev
      • homebre-deps
      • HamDecks
      • stackoverflow

      This happened to me while working on HamDecks, a small project that creates Mnemosyne decks to help you study for the Amateur Radio Operator exams using questions from the official ARRL Question pools. The source question pool files (Technician, General, Extra) though have some problems… There’s a lot of characters with strange/exotic encoding in the ARRL pool files that could not be imported into Mnemosyne. That’s how I got myself into this whole mess in the first place.

      Options

      The stackoverflow link above makes two suggestions:

      1. Use the iconv utility
      2. Use a PERL one-liner

      Your Mileage May Vary, but neither of those suggestions worked for me. So what did work then?

      Potential Solution

      Once again, we will visit our system locale settings.

      Here’s what worked for the HamDecks project:

      Instead of just prefixing the sed command with LANG=C, we prefix it with LANG=C LANG_ALL=C. I’m not saying this is a silver bullet, just that it worked for me and might work for you too.

      0 Comments
    • The Authors
    • Virtual Disk Guide

      Interested in virtualization? Do QCOWs rule your filesystem? Are you a libvirt or KVM+QEMU wizard? I wrote a book about virtual disk management. Check out the The Linux Sysadmin's Guide to Virtual Disks online for free at ScribesGuides.com.


      Consider supporting the author by purchasing a hard copy of the first edition for just $10.00 on Lulu.com.

    • bitmath

      bitmath is a Python library for dealing with file size units (GiB's, kB's, etc) in a sane way. bitmath supports arithmetic, rich comparison, conversion, automatic best human-readable representation, and many other utility functions. Read some examples on the docs site or check out the source on GitHub.

    • latest posts

      • Two Year Break — And we’re back! November 16, 2018
      • [Updated] GitHub + Gmail — Filtering for Review Requests and Mentions January 20, 2017
      • References in a sub-select January 19, 2017
    • tags

      bitmath blog conference css dblatex DNS DocBook eclipse Emacs Erlang Fedora fedora 22 GNU Screen Haiku Introduction java jboss LCSEE Linux locale locales fix slicehost ubuntu Macports module Modulus nist nXML-Mode opengl open source OS X package packaging pki prefix units presentation project pypi Python scholarship si summit Tutorial ubuntu xcode XML XMPP
    • h4ck teh world

      tbielawatbielawa
      • Push
        openshift/aos-cd-jobs
        February 21, 2019 - 8:39 pm UTC
      • Pull Request
        openshift/aos-cd-jobs
        February 21, 2019 - 8:38 pm UTC
      • Issue Comment
        openshift/aos-cd-jobs
        February 21, 2019 - 8:38 pm UTC
      • Push
        openshift/doozer
        February 21, 2019 - 7:04 pm UTC
      • Pull Request
        openshift/doozer
        February 21, 2019 - 7:03 pm UTC

Creative Commons License
Technitribe by Tim Bielawa is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.