Sunday, November 19, 2017

How to encode variable-length utf8 byte array in Java

Bluetooth name display issue:

If you check Bluetooth adapter setName(), you will get that
Valid Bluetooth names are a maximum of 248 bytes using UTF-8 encoding, although many remote devices can only display the first 40 characters, and some may be limited to just 20.

Android Supported Versions:

If you check the link, you will get the list of android supported version.

Supported and Non supported locales are given in the table:

             | DEC Korean | Korean EUC | ISO-2022-KR | KSC5601/cp949 | UCS-2/UTF-16 | UCS-4 | UTF-8 |
 DEC Korean  |      -     |      Y     |     N       |      Y        |        Y     |   Y   |   Y   |
 Korean EUC  |      Y     |      -     |     Y       |      N        |        N     |   N   |   N   |
 ISO-2022-KR |      N     |      Y     |     -       |      Y        |        N     |   N   |   N   |
KSC5601/cp949|      Y     |      N     |     Y       |      -        |        Y     |   Y   |   Y   |
 UCS-2/UTF-16|      Y     |      N     |     N       |      Y        |        -     |   Y   |   Y   |
    UCS-4    |      Y     |      N     |     N       |      Y        |        Y     |   -   |   Y   |
    UTF-8    |      Y     |      N     |     N       |      Y        |        Y     |   Y   |   -   |

For solution,

Michael has given a great example for conversion. For more you can check
When you call getBytes(), you are getting the raw bytes of the string encoded under your system's native character encoding (which may or may not be UTF-8). Then, you are treating those bytes as if they were encoded in UTF-8, which they might not be.
A more reliable approach would be to read the ko_KR-euc file into a Java String. Then, write out the Java String using UTF-8 encoding.
InputStream in = ...
Reader reader = new InputStreamReader(in, "ko_KR-euc"); // you can use specific korean locale here
StringBuilder sb = new StringBuilder();
int read;
while ((read = != -1){

String string = sb.toString();

OutputStream out = ...
Writer writer = new OutputStreamWriter(out, "UTF-8");
N.B: You should, of course, use the correct encoding name
Using StringUtils, you can do it
You can use Apache Commons IO for conversion. A very great example is given here:
1 String resource;
2 //getClass().getResourceAsStream(resource) -> the <code>InputStream</code> to read from
3 //"UTF-8" -> the encoding to use, null means platform default
4 IOUtils.toString(getClass().getResourceAsStream(resource),"UTF-8");

Resource Links:

  1. Korean Codesets and Codeset Conversion
  2. Korean Localization
  3. Changing the Default Locale
  4. Byte Encodings and Strings
Resource Link:

No comments:

Post a Comment