Tuesday, September 03, 2013

Mutithreading and Multiprocessing in Python

Recently someone asked me about multithreading in Python. The last time i wrote any multithreading code in python was at least 5 years ago, that too using the thread module for some very basic stuff. As i was brushing up on this topic, I came across some very useful articles. I am using this blog post to link to these articles for my future reference.

Multithreading in CPython is really very limited due to the existence of Global Interpreter Lock (GIL), but multiprocessing seems more promising.

Due to existence of GIL, multithreading in Python is useless for tasks that are CPU bound. For example, mathematical computations. Multithreading is best utilized in programs that are IO intensive. The IO operation could be on disk (listing files and directories, or reading/writing files, etc) or on network (communicating with websites). The important modules for this are - thread and threading. Thread is considered low-level API but has limited capabilities, while threading is considered high-level and provides more functions. This tutorial provides a quick overview of both these module via simple examples.

An important decision that you would have to make when writing multithreaded programs is whether the child threads should merge back with the parent/main thread. This will be required if you ever have to reconcile the output from all child threads to determine the next course of action for the parent thread. This decision is made by calling the join() function available with every thread object. I found the textual representation in this stack exchange answer to be very intuitive and useful to understand join() for different kind of threads.

To overcome the limitations of using multithreading in Python, the multiprocessing module side-steps the GIL by using subprocesses instead of threads. This allows the programmers to fully leverage the benefits of multi core or multiple processor architecture on a machine. This article does a good job of explaining OS forking and multiprocessor modules in python.

Finally, most programmers are interested in multithreading and multiprocessing mainly to speed up the execution of their programs. In Python, it is important for the programmer to understand the difference between these to concepts, else they could end up slowing the program execution instead of speeding it up it. I found these this article to be very helpful in showing the difference between the two. It generates benchmarks using simple examples. Additionally, the following blog post explains how python sees and uses cores on a machine.

In summary, if your Python program does a lot of IO, use multithreading, but if you are looking to speed up the CPU bound work of your program, use multiprocessing.

Saturday, August 10, 2013

Converting MP4 videos to audio CD (wav) on Ubuntu

My little one loves his nursery rhymes. At home he gets to listen to them on the ipad or on the phone, but this unnecessarily drains the battery out. Plus when we travel by car, i would rather use the car stereo than play it loudly on my phone. So i decided to convert the MP4 videos to an audio CD (unfortunately my care stereo does not support mp3 :-( ).

Here are the steps i followed:

1. After doing a bit of research MPlayer seemed to be the best option suited for my needs. The latest version is 1.1.1. Since my Ubuntu release is quite old (I use 10.10), apt-get returned no matches for mplayer. So i downloaded the source from the site and installed it. This process is rather simple:
  a. cd to the directory where you extracted the mplayer tar
  b. ./configure (add the --enable-gui switch to the command if you wish to use the player in GUI mode)
  c. make (this took forever)
  d. sudo make install

Note: during compilation, i discovered that the yasm library was missing. I had to install yasm 1.2.0 (the latest version) to resolve this.

2. The conversion process itself is quite simple. This simple invocation does the trick.
    >> mplayer -ao pcm:file=inputfile.wav outputfile.mp4

3.  I had over 28 files to convert, so doing this one by one was not an option. A bash for loop took care of this:
>> for i in `ls`
  > do  mplayer -ao pcm:file=${i//.mp4/.wav} $i
  > done
>>

line 1: we loop over the output of the `ls` command. Note the use of backticks around the ls command, this tells the shell to execute the command and present its output to the for function.

line 2: This is the mplayer function to do the conversion with a bit of bash magic to rename the converted file. Say the original file is called foo.mp4, now i want the converted file to be called foo.wav. To make this happen, I use substring replacement. The syntax is as follows: 
 >> ${string//substring/replacement}
In my case, I am replacing the extension .mp4 with .wav

line 3: close the for loop

4. The conversion process is very CPU intensive (as per top, cpu was 75%). Unfortunately, my laptop is quite old and is single CPU. This meant i couldn't do anything GUI based while this was going on. Thankfully the conversion is rather quick, only a few minutes per video.

In some cases, for causes unknown, the conversion process would get stuck despite having successfully converted the video. I simply killed that iteration and my loop continued with the next file.

With the major work done, now i just have to burn a CD and enjoy the music!!

Sunday, May 05, 2013

Change keyboard layout in Ubuntu 12.04 from command prompt

Instead of installing Ubuntu 12.04 from scratch, i decided to download a prepacked virtual box template from http://virtualboxes.org/images. Unfortunately, the keyboard configuration did not match my Lenovo T430, causing all sorts of havoc. Plus, i did not have x-windows installed so the mouse was useless as well.

To change the keyboard configuration from the command prompt, use the following syntax:
sudo dpkg-reconfigure keyboard-configuration 
  This will let you configure your keyboard settings, and also persist the changes after a reboot.