public class CharsetToolkit extends Object
Unicode files encoded in UTF-16 (low or big endian) or UTF-8 files with a Byte Order Marker are correctly discovered. For UTF-8 files with no BOM, if the buffer is wide enough, the charset should also be discovered.
A byte buffer of 4KB is used to be able to guess the encoding.
Usage:
 CharsetToolkit toolkit = new CharsetToolkit(file);
 // guess the encoding
 Charset guessedCharset = toolkit.getCharset();
 // create a reader with the correct charset
 BufferedReader reader = toolkit.getReader();
 // read the file content
 String line;
 while ((line = br.readLine())!= null)
 {
     System.out.println(line);
 }
 | Constructor and Description | 
|---|
| CharsetToolkit(File file)Constructor of the  CharsetToolkitutility class. | 
| Modifier and Type | Method and Description | 
|---|---|
| static Charset[] | getAvailableCharsets()Retrieves all the available  Charsets on the platform,
 among which the defaultcharset. | 
| Charset | getCharset() | 
| Charset | getDefaultCharset()Retrieves the default Charset | 
| static Charset | getDefaultSystemCharset()Retrieve the default charset of the system. | 
| boolean | getEnforce8Bit()Gets the enforce8Bit flag, in case we do not want to ever get a US-ASCII encoding. | 
| BufferedReader | getReader()Gets a  BufferedReader(indeed aLineNumberReader) from theFilespecified in the constructor ofCharsetToolkitusing the charset discovered or the default
 charset if an 8-bitCharsetis encountered. | 
| boolean | hasUTF16BEBom()Has a Byte Order Marker for UTF-16 Big Endian
 (utf-16 and ucs-2). | 
| boolean | hasUTF16LEBom()Has a Byte Order Marker for UTF-16 Low Endian
 (ucs-2le, ucs-4le, and ucs-16le). | 
| boolean | hasUTF8Bom()Has a Byte Order Marker for UTF-8 (Used by Microsoft's Notepad and other editors). | 
| void | setDefaultCharset(Charset defaultCharset)Defines the default  Charsetused in case the buffer represents
 an 8-bitCharset. | 
| void | setEnforce8Bit(boolean enforce)If US-ASCII is recognized, enforce to return the default encoding, rather than US-ASCII. | 
public CharsetToolkit(File file) throws IOException
CharsetToolkit utility class.file - of which we want to know the encoding.IOExceptionpublic void setDefaultCharset(Charset defaultCharset)
Charset used in case the buffer represents
 an 8-bit Charset.defaultCharset - the default Charset to be returned
 if an 8-bit Charset is encountered.public Charset getCharset()
public void setEnforce8Bit(boolean enforce)
charset rather than US-ASCII.enforce - a boolean specifying the use or not of US-ASCII.public boolean getEnforce8Bit()
public Charset getDefaultCharset()
public static Charset getDefaultSystemCharset()
Charset.public boolean hasUTF8Bom()
public boolean hasUTF16LEBom()
public boolean hasUTF16BEBom()
public BufferedReader getReader() throws FileNotFoundException
BufferedReader (indeed a LineNumberReader) from the File
 specified in the constructor of CharsetToolkit using the charset discovered or the default
 charset if an 8-bit Charset is encountered.BufferedReaderFileNotFoundException - if the file is not found.public static Charset[] getAvailableCharsets()
Charsets on the platform,
 among which the default charset.Charsets.