| To: | Justin Fister <jfister@xxxxxxxxx> |
|---|---|
| Subject: | Re: [cinjug-users] Java i18n Weirdness |
| From: | Troy Davis <troy@xxxxxxxxxxxxxxxxxx> |
| Date: | Fri, 3 Jun 2005 16:11:26 -0400 |
| Cc: | users@xxxxxxxxxx |
| Delivered-to: | mailing list users@cinjug.org |
| In-reply-to: | <bb97f32c05060308187b534726@mail.gmail.com> |
| Mailing-list: | contact users-help@cinjug.org; run by ezmlm |
| References: | <bb97f32c05060308187b534726@mail.gmail.com> |
Hi Justin, I just went through the process of upgrading an existing java app to handle Unicode text, and it was definitely a learning curve... (BTW, thank you to everyone that sent suggestions, most of them helped!) Since I completed the upgrade work, I've found myself copying and pasting text from just about anything, and much to my surprise it actually works. Even in MSIE, marvel of marvels. One of the problems you'll find in trying to convert between Windows cp1252, latin1 and other older encodings is that there's no easy way to detect which character set any given string is in. Supposedly Microsoft invested a pretty significant amount of developer time for MSIE so that it could detect character sets based on heuristic analysis. But short of getting that code and porting it to Java, I'd recommend switching to UTF-8 instead. In order to upgrade my company's app to be Unicode-safe, I had to address several different levels of concerns: 1. The database needed to be Unicode-safe. We use MySQL, but you have to use version 4.1.1+ to get that. Most hosting providers are still using 3.x or 4.0.x. One of our clients' sites is on a server that has 4.0.something, and it became a real roadblock. We wound up recompiling their jar file so that the DAO connection string pointed to our own database server. Slowed down the site a bit, but it works. The keys to this bit of magic turned out to be four-fold: - Mysql >= 4.1.1. - Changing the connection string to look like jdbc:mysql:// server.com/db_name? useUnicode=true&characterEncoding=utf8&autoReconnect=true - Exporting and converting the data to utf8. - Changing the create table clauses to include "ENGINE=MyISAM DEFAULT CHARSET=utf8;" at the end. I also found myself typing "set names 'utf8';" at the command line quite a bit before uploading converted text. 2. The jdbc driver needed to be a recent version, so I had to upgrade Connector/J. Not a big deal for our own servers, but some clients are on other company's servers, and that took some time and persuasion. 3. Page headers must specify the UTF-8 character set, so your first line in a JSP file might look like this: <%@page language="java" contentType="text/html;charset=UTF-8" pageEncoding="UTF-8"%> 4. If you're going to have page headers that say UTF-8, your html content-type metatags should be consistent, and appear just after the <head> tag: <meta http-equiv="content-type" content="text/ html;charset=UTF-8"> 5. Whatever processes your form data will require something like this: request.setCharacterEncoding("UTF-8"); 6. In order for #5 to work, you'll need the SetCharacterEncodingFilter.class in your WEB-INF/lib directory. Look in the Tomcat examples for a copy of this. You'll need to have a web.xml file that looks something like this: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE web-app
PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN"
"http://java.sun.com/dtd/web-app_2_3.dtd"><web-app>
<display-name>My App</display-name>
<description>Something about My App.</description>
<filter>
<filter-name>Set Character Encoding</filter-name>
<filter-class>filters.SetCharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>Set Character Encoding</filter-name>
<servlet-name>action</servlet-name>
</filter-mapping>
</web-app>HTH, Troy __________________ Troy Davis Technology Director Metaphor Studio 538 Reading Road Loft 200 Cincinnati, Ohio 45202 Tel: 513-723-0290 Fax: 513-723-0670 http://metaphorstudio.com On Jun 3, 2005, at 11:18 AM, Justin Fister wrote: I have a question for any Java gurus with i18n experience. I'm having a hard time understanding the way things work with a webapp -- actually why it doesn't work. Here's what's going on... I have a web-based admin that contains an HTML textarea field in which users enter in text. Often the text contains special Windows characters (such as curly quotes) and Latin-1 characters for words like "naiveté". In a servlet, I use the HttpServletRequest.getParameter() method to retrieve the text and dump it into a MySQL database that uses Latin1 as its default charset. That works fine -- no problems. The text can later be viewed fine through a web page as well as through Mysql Control Center. |
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| ||
| Previous by Date: | RE: [cinjug-users] tracing a path, Sam Corder |
|---|---|
| Next by Date: | Re: [cinjug-users] Java i18n Weirdness, Justin Fister |
| Previous by Thread: | Java i18n Weirdness, Justin Fister |
| Next by Thread: | Re: [cinjug-users] Java i18n Weirdness, Justin Fister |
| Indexes: | [Date] [Thread] [Top] [All Lists] |