Technitribe

interesting problems (and a few solutions, too)

Technitribe
  • About the Authors
  • Log In
  • Log Out
  • Lost Password
  • Register
  • Reset Password
    • 24 Sep 2013

      Mac OS X, Sed, and strange document encoding

      Written by Tim Bielawa

      The Problem

      You’re on Mac OS X (somewhere around 10.7.5) and you’re using the sed command to replace characters from the latin1 or Windows-1252 character encoding with their utf8 equivalents. Unfortunately you get an error like the following:

      sed: 1: "s/#/’/g
      ": RE error: illegal byte sequence

      Luckily you’re not alone!

      • vim_dev
      • homebre-deps
      • HamDecks
      • stackoverflow

      This happened to me while working on HamDecks, a small project that creates Mnemosyne decks to help you study for the Amateur Radio Operator exams using questions from the official ARRL Question pools. The source question pool files (Technician, General, Extra) though have some problems… There’s a lot of characters with strange/exotic encoding in the ARRL pool files that could not be imported into Mnemosyne. That’s how I got myself into this whole mess in the first place.

      Options

      The stackoverflow link above makes two suggestions:

      1. Use the iconv utility
      2. Use a PERL one-liner

      Your Mileage May Vary, but neither of those suggestions worked for me. So what did work then?

      Potential Solution

      Once again, we will visit our system locale settings.

      Here’s what worked for the HamDecks project:

      Instead of just prefixing the sed command with LANG=C, we prefix it with LANG=C LANG_ALL=C. I’m not saying this is a silver bullet, just that it worked for me and might work for you too.

      0 Comments
    • 13 Aug 2009

      Fixing my missing locales

      Written by Tim Bielawa

      Background: I run this server through Slicehost, and I enjoy their service immensely. When you set up your first server, or rebuild an existing server you get a very minimal GNU/Linux system installed. For obvious reasons, I like this a lot too.

      The problem: Both the first time I built this server, and most recently when I rebuilt it to Jaunty Jackalope, the system locales weren’t configured. I understand why this is done, that it happens doesn’t bother me. That I had a hard time finding out how to properly set my locale frustrated me a little bit.

      How do you know if your locales aren’t correctly defined? On my Jaunty Jackalope system I see messages like this:

      locale: Cannot set LC_MESSAGES to default locale: No such file or directory
      locale: Cannot set LC_ALL to default
      locale: No such file or directory

      I tried running dpkg-reconfigure locales, but that had no effect. Searching the Internet for the messages above provided a couple of possible solutions, but none of them looked like anything I was interested in. I’m a firm believer that if the Internet tells me to run a command with more than a couple of options, that it may work, but there is probably an easier, less cryptic solution. For example:

      localedef -v -c -i en_US -f UTF-8 en_US.UTF-8

      No way I’m running that. I instead searched for “slicehost locale” and found this article: Ubuntu Hardy setup. I enjoy this much more:

      locale-gen en_US.UTF-8
      
      update-locale LANG=en_US.UTF-8

      Turns out that update-locale is a Debian/Ubuntu specific command. It updates your systems default locale setting file. I had checked for one before running it and found that none existed yet on my system. After running those two commands above I found one had been created with “LANG=en_US.UTF-8” in it. It’s possible that running update-locale could have been all I needed to do to begin with.

      I hope this helps some one else whose had this problem before or for the first time.

       

      Update: 2013-05-25: This post has reached more parts of the Internet than I ever thought when I wrote it 4 years ago. Thanks to everyone who linked back instead of just copy and pasting the solution directly.

      These days I’m running Fedora on Linode. And all is well.

      5 Comments
    • The Authors
    • Virtual Disk Guide

      Interested in virtualization? Do QCOWs rule your filesystem? Are you a libvirt or KVM+QEMU wizard? I wrote a book about virtual disk management. Check out the The Linux Sysadmin's Guide to Virtual Disks online for free at ScribesGuides.com.


      Consider supporting the author by purchasing a hard copy of the first edition for just $10.00 on Lulu.com.

    • bitmath

      bitmath is a Python library for dealing with file size units (GiB's, kB's, etc) in a sane way. bitmath supports arithmetic, rich comparison, conversion, automatic best human-readable representation, and many other utility functions. Read some examples on the docs site or check out the source on GitHub.

    • latest posts

      • Using jq to filter an array of objects from JSON September 9, 2019
      • Two Year Break — And we’re back! November 16, 2018
      • [Updated] GitHub + Gmail — Filtering for Review Requests and Mentions January 20, 2017
    • tags

      bitmath blog conference css dblatex DNS DocBook eclipse Emacs Erlang Fedora fedora 22 filter GNU Screen Haiku Introduction java jboss LCSEE Linux locale locales fix slicehost ubuntu Macports module nist nXML-Mode opengl open source OS X package packaging pki prefix units presentation project pypi Python scholarship si summit Tutorial ubuntu xcode XML XMPP
    • h4ck teh world

      tbielawatbielawa
      • Push
        Minecraft-Overviewer
        March 27, 2021 - 5:23 pm UTC
      • Watch
        aristocratos/bashtop
        February 1, 2021 - 2:55 pm UTC
      • Watch
        toadjaune/pulseaudio-config
        February 1, 2021 - 2:40 pm UTC

Creative Commons License
Technitribe by Tim Bielawa is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.