A Windows XP help forum. PCbanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » PCbanter forum » Microsoft Windows XP » Windows XP Help and Support
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

How to detect if a text file is ISO8859-1,ISO8859-15,UTF-8 or UniCode encoded



 
 
Thread Tools Display Modes
  #1  
Old January 22nd 10, 10:21 AM posted to microsoft.public.windowsxp.general,microsoft.public.windowsxp.help_and_support
Karl Mondale
external usenet poster
 
Posts: 7
Default How to detect if a text file is ISO8859-1,ISO8859-15,UTF-8 or UniCode encoded

Assume I have a text file. How can I detect if the text inside is encoded in

ISO8859-1
ISO8859-15
UTF-8
UniCode

Karl

Ads
  #2  
Old January 22nd 10, 10:51 AM posted to microsoft.public.windowsxp.help_and_support
Pegasus [MVP]
external usenet poster
 
Posts: 2,361
Default How to detect if a text file is ISO8859-1,ISO8859-15,UTF-8 or UniCode encoded



"Karl Mondale" said this in news item
...
Assume I have a text file. How can I detect if the text inside is encoded
in

ISO8859-1
ISO8859-15
UTF-8
UniCode

Karl


I would check Google or Wikipedia, e.g. he
http://en.wikipedia.org/wiki/ISO/IEC_8859-1. It explains the whole code in
detail. To find out programmatically you need to read the first few bytes.
The exact method depends on the tool you wish to use.

  #4  
Old January 22nd 10, 05:03 PM posted to microsoft.public.windowsxp.help_and_support
Paul Randall
external usenet poster
 
Posts: 335
Default How to detect if a text file is ISO8859-1,ISO8859-15,UTF-8 or UniCode encoded

The short answer is that you can't alway determine the encoding from the
content of a file.

To see why, you can use Notepad to experiment with creating and saving text
as ANSI, Unicode, Unicode Big Endian, and UTF-8. Try pasting in some some
text from foreign web pages, as well as plain English text. Looking at the
files in a hex editor, like XVI32, you will see that for all but Ansi,
Notepad prepends a few bytes (called a Byte Order Mark) to indicate the type
of text file. For Unicode, it is the two byte sequence (hex) FFFE or FEFF,
to indicate either big endian or little endian unicode. Not all
applications prepend a BOM. Ansi and your two ISO encodings always use one
byte per character. Unicode always uses two bytes per character, except the
new Unicode-32 uses 4 bytes per character. UTF-8 uses a variable number of
bytes per character (one to five, I think), and can encode all two-byte
Unicode characters. For saving as Ansi, Notepad complains if all characters
can't be saved as one-byte characters.

-Paul Randall

"Karl Mondale" wrote in message
...
Assume I have a text file. How can I detect if the text inside is encoded
in

ISO8859-1
ISO8859-15
UTF-8
UniCode

Karl



  #5  
Old January 22nd 10, 08:15 PM posted to microsoft.public.windowsxp.help_and_support
Carmel[_2_]
external usenet poster
 
Posts: 4
Default How to detect if a text file is ISO8859-1,ISO8859-15,UTF-8 or UniC

"Karl Mondale" wrote:

Assume I have a text file. How can I detect if the text inside is encoded in

ISO8859-1
ISO8859-15
UTF-8
UniCode

Karl


Microsoft doesn't distribute a utility that can accomplish that feat easily.
If you can get your file transfered to a FreeBSD or Linux system, you could
use either 'file' or 'enca' to determine its property's.

MAN pages:

http://unixhelp.ed.ac.uk/CGI/man-cgi?file
http://linux.die.net/man/1/enca

--
Carmel

Never forget: 2 + 2 = 5 for extremely large values of 2.
 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off






All times are GMT +1. The time now is 11:57 AM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 PCbanter.
The comments are property of their posters.