users
[Top] [All Lists]

Re: [cinjug-users] XSLT: unparsed entities

To: deshmol-lists@xxxxxxxxx
Subject: Re: [cinjug-users] XSLT: unparsed entities
From: "Eric Bardes" <ericbardes@xxxxxxxxx>
Date: Wed, 3 May 2006 11:30:22 -0400
Cc: users@xxxxxxxxxx
Delivered-to: mailing list users@cinjug.org
Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=l+x9tOIbACEJmLWD75+Wc2/oUq3Soa36vH1AmPyGlaK9MkVOjU1GS/fdWrq8SkiuTfLfKTFM2C+AFJrg7ih9NrWOTnk+tO5MARKXjoY0rVUsC6hHo/uqaP3061trvKUsI56wTw4v6/uaHp3zMLWynOJufH5ptk1KUAjOPO72BVc=
In-reply-to: <20060501142826.71074.qmail@web32013.mail.mud.yahoo.com>
Mailing-list: contact users-help@cinjug.org; run by ezmlm
References: <20060501142826.71074.qmail@web32013.mail.mud.yahoo.com>
This message ended up being a little more long winded than expected. I also hope the foreign characters come though okay.

Of the top of my head, I don't know how to tell XSLT to emit entity
references.  I will suggest a different path that gets me past many of
the international character issues.

There are many standards for encoding international characters. I've
noticed that many XML processors, web servers and browsers have
different defaults for which system to use when one isn't specified. Of course, this is a sure fire recipe for character corruption. I
always specify the character encoding in by XML, specifying the output
encoding in the xsl:output XSLT tag, add the character encoding HTTP
header and the HTML meta tag.


I have a friend named Jörg Stühmeier.
Using raw entities, his first name would be: J&x00F6;rg.
Under ISO-8859-1, his first name is four bytes: 4A F6 72 67.
Under UTF-8, his first name is five bytes: 4A C3 B6 72 67.

If SAXON is emitting UTF-8 and your browser is interpreting
ISO-8859-1, you get Jörg.

In Sun's J2EE documentation, it says the default character encoding
for servlets is iso-8859-1 which won't work for non-European
languages.  In your servlet or JSP code add the following line:
response.setCharacterEncoding("UTF-8");

In your XSLT style sheet, make sure you have the following:
<xsl:output method="html" encoding="UTF-8" />

This should keep all the parts on the same standard.

--
Cheers,
Eric Bardes

<Prev in Thread] Current Thread [Next in Thread>