- 最后登录
- 2023-8-16
- 在线时间
- 1686 小时
- 威望
- 2135
- 金钱
- 50532
- 注册时间
- 2011-10-12
- 阅读权限
- 200
- 帖子
- 5207
- 精华
- 39
- 积分
- 2135
- UID
- 2
|
5#
发表于 2012-3-12 11:09:02
NLS_LANG and webservers explained.- NLS_LANG and web servers explained.
- ----------------------------------
-
- When using a web based infrastructure there is (compared to a "normal"
- Windows or UNIX application) one layer more that can provoke problems
- when dealing with character sets. And that's the browser...
-
- This guide will try to give a generic overview of how this works but is not
- intended to replace web server or application manuals.
- If any doubt please consult the vendor of the application language / web server.
-
- What happens when you type data in a field in a web environment?
-
- I'm assuming here that you are using Firefox on a Unix X-windows system
- or IE / Firefox on a windows client.
-
- a) The Client operating system character set.
-
- The text you type is in the native character set of the operating system
- and passed to the browser who "knows" what character set the GUI is running in.
- Most X-windows environments are Unicode (UTF-8).
- Input in windows from a keyboard is typically the ANSI (ACP) character set
- like 1252 for a west european windows installation.
- Copy paste from MS Word or a other Unicode source will place data in
- UCS2 (= the Unicode encoding used by Windows)on the clipboard
-
- b) The browser tries to find out the character set of the website.
-
- Then the browser checks what character set encoding the website uses.
- This can be:
-
- b1) defined by the HTTP header returned by the web server.
-
- like: Content-Type: text/html; charset=UTF-8
-
- If the web server returns no charset definition in the HTTP header
- then the character set is assumed to be in ISO-8859-1 or (!) the
- "default character set" which can be set by the user in the browser.
- Which basically means when you don't specify the HTTP header you
- never know what the client (=browser) will take.
- So this is a very good thing to define.
-
- How to make a server send this header depends on the particular server,
- check your server documentation if necessary.
- Info for the most popular HTTP servers like Apache and IIS is found on
-
- http://www.w3.org/International/O-HTTP-charset
-
-
- b2) defined in the html itself by the charset meta tag in the head of the html document.
-
- like: <META http-equiv="Content-Type" content="text/html; charset=UTF-8">
-
- see the http://www.w3.org/International/O-charset.html website
-
- Please do NOT rely only on the HTML Meta Tag to define a character set.
- Depending on the used programming language or html/xml parser you might
- get unexpected results when mixing or not including charset declaration.
- To avoid problems we recommend to:
-
- * use Unicode (UTF-8) where possible as this is the best option.
- * declare always the HTTP header.
- * do not declare a Meta tag different from the HTTP header.
- * certainly do NOT mix html documents with different or
- missing encoding definitions in one page using applets,
- frames or iframes.
- * if html pages are cachable then it's a good idea to include the meta tag
- seen the http header is not there when fetched from the cache
-
- c) The Browser knows now enough to do the conversion.
-
- Once the browser knows the native character set of the operating system
- and the character set defined by the web environment, the browser will do
- the translation from the OS character set to the character set used by the
- web pages.
- As you might have noticed, up to this point the data is even not yet
- at the web server.
-
- We cannot stress enough the importance of this first conversion step.
-
- Assume you have a windows client who passes the euro symbol to the browser
- (= 1252 character set). If the browser gets a webpage without declared character set
- then it will use "charset=ISO-8859-1".
- "charset=ISO-8859-1" does not know the euro symbol and the euro is already lost
- at this stage. The character set of the Oracle database later on is irrelevant
- seen we already lost the character before it even left the client (=browser).
-
- This first step need to be right or you run into troubles.
- If this is not correct then it's useless to go further.
-
- So what can you do?
-
- * When deploying a web application we strongly recommend using Unicode (UTF-8)
- as character set for the Web server and Application side.
- With a Unicode web environment you can process any language you want and
- you never need to worry about the client side conversion(=browser)
- any more.
- The browser will always be able to convert the Operating system
- character set to Unicode (UTF-8) and back.
-
- How to check that you http header is correctly picked up by the browser:
-
- * for IE 5 or 6: in view-encoding menu the "autoselect" flagged and that
- IE then chooses the character set you have defined in your http server
- (utf8 normally)
-
- * for Firefox: in view-character coding that it's the character set you
- have defined in your http server (utf8 normally).
-
- Please do NOT change this manually to "see things correctly".
- If you browser is not choosing automatically the correct characterset
- then your application / web server setup has a problem and you
- need to correct this first.
- Where to define the character set of the http header?
-
- http://www.w3.org/International/O-HTTP-charset or contact your
- (application) vendor.
-
- Where to define the character set of the html (Meta tag) ?
-
- * For the popular PHP language you can set the character set declaration globally
- in the php.ini by defining a "default_charset". See your PHP manual for more info.
-
- A guide to Globalize PHP applications:
- http://www.oracle.com/technology/tech/php/pdf/globalizing_oracle_php_applications.pdf
-
- (note that we do not provide direct support for PHP development)
-
- * When using mod_plsql more information can be found in:
-
- Note 244544.1 An NLS Character Set Primer for mod_plsql
- * When using ASP you might want to have a look at
- http://support.microsoft.com/?id=893663
- (Globalization issues in ASP and ASP.NET)
- Please contact Microsoft if you have questions about ASP.
-
- * For other programming environments please consult the manual or the vendor.
-
-
- d) The web server has the data and connects to the database.
-
- Then the web server connects to oracle, here you can consider the web server
- as a normal Oracle (!) client, and you know what kind of character set you
- are using from the previous points (the "web server character set").
-
- You need to set the NLS_LANG to that character set used in the html/http
- header in the environment of the operating user that runs the process
- that connects to oracle.
-
- On UNIX this is normally the user environment that starts the web server,
- On windows you need to put it in the registry of the oracle home of the used
- oracle client libraries (!).
-
- Note that on windows the Apache is NOT picking up a NLS_LANG if defined as an environment
- variable, so you NEED to set it in the registry.
-
- The only exception is a "Thin Jdbc" connection, this is a direct TCP/IP
- connection from the java environment (which is Unicode based) on the client
- running the java program and you cannot set any NLS_LANG (or need to).
-
- If the conversion in the browser is correct, you won't get any troubles from
- that.
-
- see Note 115001.1 NLS_LANG Client Settings and JDBC Drivers
- for more info on jdbc and NLS.
-
- Common IANA character set Name vs. Oracle character set name
- (Http header & html meta tag) (NLS_LANG character set part)
-
- UTF-8 UTF8
- windows-1250 EE8MSWIN1250
- windows-1251 CL8MSWIN1251
- windows-1252 WE8MSWIN1252
- windows-1253 EL8MSWIN1253
- windows-1254 TR8MSWIN1254
- windows-1255 IW8MSWIN1255
- windows-1256 AR8MSWIN1256
- windows-1257 BLT8MSWIN1257
- windows-1258 VN8MSWIN1258
- windows-936 or GBK ZHS16GBK
- Big5 ZHT16MSWIN950
- Big5-HKSCS ZHT16HKSCS - Hong Kong extension of big5
- TIS-620 TH8TISASCII
- Shift_JIS JA16SJIS
- korean or KS_C_5601-1989 KO16MSWIN949
-
- To know what language each character set supports have a look at
- Note:179133.1 The correct NLS_LANG in a Windows Environment
- UTF8 is a Unicode character set and supports (almost)
- any spoken language in the world.
-
- e) The data arrives at the database.
-
- The final thing to check is that your Oracle database can store the data
- coming from the web server.
- Again, the best solution is to use (AL32)UTF8 as NLS_CHARACTERSET otherwise
- if you have for example 2 web servers , one configured to handle Arabic and
- one for Russian then you would need 2 databases seen, besides UTF8,
- there is no character set that supports both Arabic and Russian languages.
-
- More information about using a AL32UTF8 database is found here
- Note:788156.1 AL32UTF8 / UTF8 (Unicode) Database Character Set Implications
- Note:144808.1 Examples and limits of BYTE and CHAR semantics usage
- Note:260893.1 Unicode character sets in the Oracle database
-
- your current NLS_CHARACTERSET can be found using this select:
-
- select value from NLS_DATABASE_PARAMETERS where parameter='NLS_CHARACTERSET';
-
- This should be the same as the one used in your website or a superset like (AL32)UTF8.
-
- some additional information:
- ----------------------------
- * When using webforms please see
- Note 314074.1Understanding and Troubleshooting NLS Issues in Web Deployed Oracle Forms
- Webforms is a Java/OC4J environment, not a html environment and does not use
- the webserver HTTP encoding for transporting data.
- The actual connection to the database is made by a C runtime process on
- the forms server.
- * When dealing with UTF8 it's a good idea to set up iSqlPlus
- (that's the web-based Unicode version of sqlplus) please see:
-
- Note 231231.1 Quick setup of iSQL*Plus 9.2 as Unicode client on windows.
- iSQLplus is in 10g and up by default a UTF8 client, no need to configure anything.
- Note 281847.1 How do I configure or test iSQL*Plus 10i?
-
- If you have troubles inserting / fetching data correctly trough your
- web application then simply update a few rows trough iSqlplus.
- That way you know the data is good in the database and you can "work back"
- from there until you see the data correct in your application.
-
- * If you notice that "extended" / non US7ASCII characters are store like
- " & # 1 6 3 ; " (-> here are spaces added to avoid your browser to display a £ sign)
- then have a look at Note 296376.1 Data in database is stored with & and # symbols
-
-
- RELATED DOCUMENTS
- -----------------
- Note 115001.1 NLS_LANG Client Settings and JDBC Drivers
- Note 179133.1 The correct NLS_LANG in a Windows Environment
- Note 158577.1 NLS_LANG Explained (How does Client-Server Character Conversion Work?)
- Note 231231.1 Quick setup of iSQL*Plus 9.2 as Unicode client on windows.
- Note 281847.1 How do I configure or test iSQL*Plus 10i?
- Note 244544.1 An NLS Character Set Primer for mod_plsql
- Note 296376.1 Data in database is stored with & and # symbols
-
-
- More info about the standards using in a web based environment:
-
- Hypertext Transfer Protocol -- HTTP/1.1
- http://www.w3.org/Protocols/rfc2068/rfc2068.txt
-
- The HTML 3.2 (Wilbur) recommendation
- [This includes all character entities listed in HTML 2.0 plus new named entities covering the ISO 8859-1 120-191 range.]
- http://www.w3.org/MarkUp/Wilbur/
-
- The HTML 4.0 Recommendation
- [Includes new Unicode character entities]
- http://www.w3.org/TR/REC-html40/
-
- The W3C HTML Internationalization area
- http://www.w3.org/International/O-HTML.html
-
- RFC 1866: The HTML 2.0 specification (plain text.) Appendix contains Character Entity table.
- http://www.rfc-editor.org/rfc/rfc1866.txt
-
- The web version of the HTML 2.0 (RFC 1866) Character Entity table
- http://www.w3.org/MarkUp/html-spec/html-spec_13.html
-
- For further NLS / Globalization information you may start here:
- Note 267942.1 Globalization Technology (NLS) Knowledge Browser
复制代码 |
|