首页 > 编程语言> > java-在jdbc中将字符转换为â€™

java-在jdbc中将字符转换为â€™

2019-12-08 15:15:53 作者：互联网

我正在尝试从MySql数据库中读取UTF-8字符串,该字符串是使用以下命令创建的：

CREATE DATABASE april
  DEFAULT CHARACTER SET utf8
  DEFAULT COLLATE utf8_general_ci;

我使用以下方法制作感兴趣的表：

DROP TABLE IF EXISTS `article`;
CREATE TABLE `article` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `text` longtext NOT NULL,
  `date_created` timestamp DEFAULT NOW(),
  PRIMARY KEY (`id`)
) CHARACTER SET utf8;

如果从MySql命令行实用程序的文章中选择*,则会得到：

OIL sands output at Nexen’s Long Lake project dropped in February.

但是,当我这样做

ResultSet rs = st.executeQuery(QUERY);

long id = -1;
String text = null;
Timestamp date = null;
while (rs.next()) {
    text = rs.getString("text");
    LOGGER.debug("text=" text);
}

我得到的输出是：

text=OIL sands output at Nexenâ€™s Long Lake project dropped in February.

我通过以下方式获得连接：

DriverManager.getConnection("jdbc:" + this.dbms + "://" + this.serverHost + ":" + this.serverPort + "/" + this.dbName + "?useUnicode&user=" + this.username + "&password=" + this.password);

我也试过了,而不是useUnicode参数：

characterEncoding=UTF-8
and
characterEncoding=utf8

我也尝试过,而不是一行text = rs.getString(“ text”)

rs.getBytes("text");
String[] encodings = new String[]{"US-ASCII", "ISO-8859-1", "UTF-8", "UTF-16BE", "UTF-16LE", "UTF-16", "Latin1"};
for (String encoding : encodings) {
    text = new String(temp, encoding);
    LOGGER.debug(encoding + ": " + text);
}
// Which outputted:
US-ASCII: OIL sands output at Nexen��������s Long Lake project dropped in February.
ISO-8859-1: OIL sands output at NexenÃ¢â¬â¢s Long Lake project dropped in February.
UTF-8: OIL sands output at Nexenâ€™s Long Lake project dropped in February.
UTF-16BE: 佉䰠獡湤猠潵瑰畴⁡琠乥硥滃ꋢ芬ꉳ⁌潮朠䱡步⁰牯橥捴⁤牯灰敤⁩渠䙥扲畡特�
UTF-16LE: 䥏⁌慳摮⁳畯灴瑵愠⁴敎數썮겂蓢玢䰠湯⁧慌敫瀠潲敪瑣搠潲灰摥椠⁮敆牢慵祲�
UTF-16: 佉䰠獡湤猠潵瑰畴⁡琠乥硥滃ꋢ芬ꉳ⁌潮朠䱡步⁰牯橥捴⁤牯灰敤⁩渠䙥扲畡特�
Latin1: OIL sands output at NexenÃ¢â¬â¢s Long Lake project dropped in February.

我使用文件中的一些预定义的sql将字符串加载到DB中.该文件是UTF-8编码的.

mysql -u april -p -D april < insert_articles.sql

该文件包括以下行：

 INSERT INTO article (text) value ("OIL sands output at Nexen’s Long Lake project dropped in February.");

当我使用以下方法在应用程序中打印该文件时：

BufferedReader reader = new BufferedReader(new FileReader(new File("/home/path/to/file/sql_article_inserts.sql")));
 String str;
 while((str = reader.readLine()) != null) {
     LOGGER.debug("LINE: " + str);
 }

我得到正确的预期输出：

LINE: INSERT INTO article (text) value ("OIL sands output at Nexen’s Long Lake project dropped in February.");

任何帮助将非常感激.

一些系统细节：
我在Linux(Ubuntu)上运行

编辑：
*编辑以指定操作系统
*编辑以读取sql输入文件的详细输出.
*编辑以指定有关如何将数据插入数据库的更多信息.
*编辑以修复代码中的错字,并阐明示例.

解决方法:

您是否可能使用错误的编码读取日志文件？ Windows 1252,我猜.

UTF-8: OIL sands output at Nexenâ€™s Long Lake project dropped in February.

如果这出现在日志中,请对日志文件进行十六进制转储.如果数据为UTF-8,则您希望序列Nexen变成4E 65 78 65 6E E2 80 9973.如果其他应用程序将其读取为本机ANSI编码,则会将其解码为Nexen.

为了确认,您还可以转储返回值的各个字符,以查看它们在UTF-16中是否正确：

//untested
for(char ch : text.toCharArray()) {
   System.out.printf("%04x%n", (int) ch);
}

我假设所有数据都在BMP中,所以您可以在Unicode charts中查找结果.

标签：character-encoding,utf-8,jdbc,java,mysql
来源： https://codeday.me/bug/20191208/2092316.html