乱码的本质就是:读取二进制的时候采用的编码和最初将字符转换成二进制时的编码不一致。
UTF-8和GBK是两套中文支持较好的编码,所以经常会进行它们之间的转换.
1.UTF-8转换成GBK:鎴戜滑鏄腑鍥戒汉
UTF-8转换成GBK再转成UTF-8:我们是中国人
import java.io.UnsupportedEncodingException;
public class EncodingTest {
public static void main(String[] args) throws UnsupportedEncodingException {
String srcString = "我们是中国人";
String gbk2UtfString = new String(srcString.getBytes("GBK"), "UTF-8");
System.out.println("GBK转换成UTF-8:" + gbk2UtfString);
String gbk2Utf2GbkString = new String(gbk2UtfString.getBytes("UTF-8"), "GBK");
System.out.println("GBK转换成UTF-8再转成GBK:" + gbk2Utf2GbkString);
}
}
2.以GBK编码再以UTF-8解码,再以UTF-8编码,再以GBK解码。
这次的运行结果是:
GBK转换成UTF-8:�������й���
GBK转换成UTF-8再转成GBK:锟斤拷锟斤拷锟斤拷锟叫癸拷锟斤拷
import java.io.UnsupportedEncodingException;
public class EncodingTest {
public static void main(String[] args) throws UnsupportedEncodingException {
String srcString = "我们是中国人";
String gbk2UtfString = new String(srcString.getBytes("GBK"), "UTF-8");
System.out.println("GBK转换成UTF-8:" + gbk2UtfString);
String gbk2Utf2GbkString = new String(gbk2UtfString.getBytes("UTF-8"), "GBK");
System.out.println("GBK转换成UTF-8再转成GBK:" + gbk2Utf2GbkString);
}
}