How to use
guessEncoding
method
in
groovy.util.CharsetToolkit

Best Java code snippets using groovy.util.CharsetToolkit.guessEncoding (Showing top 5 results out of 315)

public Charset getCharset() {
  if (this.charset == null)
    this.charset = guessEncoding();
  return charset;
}

public Charset getCharset() {
  if (this.charset == null)
    this.charset = guessEncoding();
  return charset;
}

public Charset getCharset() {
  if (this.charset == null)
    this.charset = guessEncoding();
  return charset;
}

public Charset getCharset() {
  if (this.charset == null)
    this.charset = guessEncoding();
  return charset;
}

public Charset getCharset() {
  if (this.charset == null)
    this.charset = guessEncoding();
  return charset;
}

Javadoc

Guess the encoding of the provided buffer.

If Byte Order Markers are encountered at the beginning of the buffer, we immidiately return the charset implied by this BOM. Otherwise, the file would not be a human readable text file.

If there is no BOM, this method tries to discern whether the file is UTF-8 or not. If it is not UTF-8, we assume the encoding is the default system encoding (of course, it might be any 8-bit charset, but usually, an 8-bit charset is the default one).

It is possible to discern UTF-8 thanks to the pattern of characters with a multi-byte sequence.

 
UCS-4 range (hex.)        UTF-8 octet sequence (binary) 
0000 0000-0000 007F       0xxxxxxx 
0000 0080-0000 07FF       110xxxxx 10xxxxxx 
0000 0800-0000 FFFF       1110xxxx 10xxxxxx 10xxxxxx 
0001 0000-001F FFFF       11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 
0020 0000-03FF FFFF       111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 
0400 0000-7FFF FFFF       1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

With UTF-8, 0xFE and 0xFF never appear.

Popular methods of CharsetToolkit

<init>
Constructor of the CharsetToolkit utility class.
getCharset
getDefaultSystemCharset
Retrieve the default charset of the system.
getReader
Gets a BufferedReader (indeed a LineNumberReader) from the File specified in the constructor of Char
hasUTF16BEBom
Has a Byte Order Marker for UTF-16 Big Endian (utf-16 and ucs-2).
hasUTF16LEBom
Has a Byte Order Marker for UTF-16 Low Endian (ucs-2le, ucs-4le, and ucs-16le).
hasUTF8Bom
Has a Byte Order Marker for UTF-8 (Used by Microsoft's Notepad and other editors).
isContinuationChar
If the byte has the form 10xxxxx, then it's a continuation byte of a multiple byte character;
isFiveBytesSequence
If the byte has the form 11110xx, then it's the first byte of a five-bytes sequence character.
isFourBytesSequence
If the byte has the form 11110xx, then it's the first byte of a four-bytes sequence character.
isSixBytesSequence
If the byte has the form 1110xxx, then it's the first byte of a six-bytes sequence character.
isThreeBytesSequence
If the byte has the form 1110xxx, then it's the first byte of a three-bytes sequence character.

Popular in Java

Updating database using SQL prepared statement
getContentResolver (Context)
scheduleAtFixedRate (Timer)
getSupportFragmentManager (FragmentActivity)
BufferedWriter (java.io)
Wraps an existing Writer and buffers the output. Expensive interaction with the underlying reader is
FileInputStream (java.io)
An input stream that reads bytes from a file. File file = ...finally if (in != null) in.clos
TimeZone (java.util)
TimeZone represents a time zone offset, and also figures out daylight savings. Typically, you get a
ConcurrentHashMap (java.util.concurrent)
A plug-in replacement for JDK1.5 java.util.concurrent.ConcurrentHashMap. This version is based on or
Logger (org.slf4j)
The org.slf4j.Logger interface is the main user entry point of SLF4J API. It is expected that loggin
Scheduler (org.quartz)
This is the main interface of a Quartz Scheduler. A Scheduler maintains a registry of org.quartz.Job
From CI to AI: The AI layer in your organization

How to use guessEncodingmethodin groovy.util.CharsetToolkit

Best Java code snippets using groovy.util.CharsetToolkit.guessEncoding (Showing top 5 results out of 315)

How to use
guessEncoding
method
in
groovy.util.CharsetToolkit