Wednesday, July 20, 2016

How to change character encoding of a text file on Linux

http://ask.xmodulo.com/change-character-encoding-text-file-linux.html


http://stackoverflow.com/questions/5306153/how-to-get-terminals-character-encoding
For checking current charset
[user@ip-192-191-181-181 java]$ echo $LANG
ja_JP.UTF-8
For checking current encoding
[user@ip-192-191-181-181 java]$ locale charmap
UTF-8
For current encoding:
locale charmap
For available locales:
locale -a
For available encodings:
locale -m

Questions:

  1. How can I know which character encoding a certain text file is using?
  2. How can I convert it to some other encoding of my choosing?

Step One

In order to find out the character encoding of a file, we will use a commad-line tool called file. Since the filecommand is a standard UNIX program, we can expect to find it in all modern Linux distros.
Run the following command:
$ file --mime-encoding filename


Step Two

The next step is to check what kinds of text encodings are supported on your Linux system. For this, we will use a tool called iconv with the "-l" flag (lowercase L), which will list all the currently supported encodings.
$ iconv -l
The iconv utility is part of the the GNU libc libraries, so it is available in all Linux distributions out-of-the-box.

Step Three

Once we have selected a target encoding among those supported on our Linux system, let's run the following command to perform the conversion:
$ iconv -f old_encoding -t new_encoding filename
For example, to convert iso-8859-1 to utf-8:

$ iconv -f iso-8859-1 -t utf-8 input.txt
Knowing how to use these tools together as we have demonstrated, you can for example fix a broken subtitle file:



No comments:

Post a Comment