Rizvi's Blog: How to change character encoding of a text file on Linux

Wednesday, July 20, 2016

How to change character encoding of a text file on Linux

http://ask.xmodulo.com/change-character-encoding-text-file-linux.html

How to get terminal's Character Encoding

http://stackoverflow.com/questions/5306153/how-to-get-terminals-character-encoding

For checking current charset

[user@ip-192-191-181-181 java]$ echo $LANG

ja_JP.UTF-8

For checking current encoding

[user@ip-192-191-181-181 java]$ locale charmap

UTF-8

For current encoding:

locale charmap

For available locales:

locale -a

For available encodings:

locale -m

Questions:

How can I know which character encoding a certain text file is using?
How can I convert it to some other encoding of my choosing?

Step One

In order to find out the character encoding of a file, we will use a commad-line tool called file. Since the filecommand is a standard UNIX program, we can expect to find it in all modern Linux distros.

Run the following command:

$ file --mime-encoding filename

Step Two

The next step is to check what kinds of text encodings are supported on your Linux system. For this, we will use a tool called iconv with the "-l" flag (lowercase L), which will list all the currently supported encodings.

$ iconv -l

The iconv utility is part of the the GNU libc libraries, so it is available in all Linux distributions out-of-the-box.

Step Three

Once we have selected a target encoding among those supported on our Linux system, let's run the following command to perform the conversion:

$ iconv -f old_encoding -t new_encoding filename

For example, to convert iso-8859-1 to utf-8:

$ iconv -f iso-8859-1 -t utf-8 input.txt

Knowing how to use these tools together as we have demonstrated, you can for example fix a broken subtitle file:

Rizvi's Blog

Pages