public class IncrementalSAXSource_Filter extends java.lang.Object implements IncrementalSAXSource, ContentHandler, DTDHandler, LexicalHandler, ErrorHandler, java.lang.Runnable
IncrementalSAXSource_Filter implements IncrementalSAXSource, using a standard SAX2 event source as its input and parcelling out those events gradually in reponse to deliverMoreNodes() requests. Output from the filter will be passed along to a SAX handler registered as our listener, but those callbacks will pass through a counting stage which periodically yields control back to the controller coroutine.
%REVIEW%: This filter is not currenly intended to be reusable for parsing additional streams/documents. We may want to consider making it resettable at some point in the future. But it's a small object, so that'd be mostly a convenience issue; the cost of allocating each time is trivial compared to the cost of processing any nontrival stream.
For a brief usage example, see the unit-test main() method.
This is a simplification of the old CoroutineSAXParser, focusing specifically on filtering. The resulting controller protocol is _far_ simpler and less error-prone; the only controller operation is deliverMoreNodes(), and the only requirement is that deliverMoreNodes(false) be called if you want to discard the rest of the stream and the previous deliverMoreNodes() didn't return false.
| Constructor and Description |
|---|
IncrementalSAXSource_Filter() |
IncrementalSAXSource_Filter(CoroutineManager co,
int controllerCoroutineID)
Create a IncrementalSAXSource_Filter which is not yet bound to a specific
SAX event source.
|
| Modifier and Type | Method and Description |
|---|---|
void |
characters(char[] ch,
int start,
int length)
Receive notification of character data.
|
void |
comment(char[] ch,
int start,
int length)
Report an XML comment anywhere in the document.
|
static IncrementalSAXSource |
createIncrementalSAXSource(CoroutineManager co,
int controllerCoroutineID) |
java.lang.Object |
deliverMoreNodes(boolean parsemore)
deliverMoreNodes() is a simple API which tells the coroutine
parser that we need more nodes.
|
void |
endCDATA()
Report the end of a CDATA section.
|
void |
endDocument()
Receive notification of the end of a document.
|
void |
endDTD()
Report the end of DTD declarations.
|
void |
endElement(java.lang.String namespaceURI,
java.lang.String localName,
java.lang.String qName)
Receive notification of the end of an element.
|
void |
endEntity(java.lang.String name)
Report the end of an entity.
|
void |
endPrefixMapping(java.lang.String prefix)
End the scope of a prefix-URI mapping.
|
void |
error(SAXParseException exception)
Receive notification of a recoverable error.
|
void |
fatalError(SAXParseException exception)
Receive notification of a non-recoverable error.
|
int |
getControllerCoroutineID() |
CoroutineManager |
getCoroutineManager() |
int |
getSourceCoroutineID() |
void |
ignorableWhitespace(char[] ch,
int start,
int length)
Receive notification of ignorable whitespace in element content.
|
void |
init(CoroutineManager co,
int controllerCoroutineID,
int sourceCoroutineID) |
void |
notationDecl(java.lang.String a,
java.lang.String b,
java.lang.String c)
Receive notification of a notation declaration event.
|
void |
processingInstruction(java.lang.String target,
java.lang.String data)
Receive notification of a processing instruction.
|
void |
run() |
void |
setContentHandler(ContentHandler handler)
Register a SAX-style content handler for us to output to
|
void |
setDocumentLocator(Locator locator)
Receive an object for locating the origin of SAX document events.
|
void |
setDTDHandler(DTDHandler handler)
Register a SAX-style DTD handler for us to output to
|
void |
setErrHandler(ErrorHandler handler) |
void |
setLexicalHandler(LexicalHandler handler)
Register a SAX-style lexical handler for us to output to
|
void |
setReturnFrequency(int events) |
void |
setXMLReader(XMLReader eventsource)
Bind our input streams to an XMLReader.
|
void |
skippedEntity(java.lang.String name)
Receive notification of a skipped entity.
|
void |
startCDATA()
Report the start of a CDATA section.
|
void |
startDocument()
Receive notification of the beginning of a document.
|
void |
startDTD(java.lang.String name,
java.lang.String publicId,
java.lang.String systemId)
Report the start of DTD declarations, if any.
|
void |
startElement(java.lang.String namespaceURI,
java.lang.String localName,
java.lang.String qName,
Attributes atts)
Receive notification of the beginning of an element.
|
void |
startEntity(java.lang.String name)
Report the beginning of some internal and external XML entities.
|
void |
startParse(InputSource source)
Launch a thread that will run an XMLReader's parse() operation within
a thread, feeding events to this IncrementalSAXSource_Filter.
|
void |
startPrefixMapping(java.lang.String prefix,
java.lang.String uri)
Begin the scope of a prefix-URI Namespace mapping.
|
void |
unparsedEntityDecl(java.lang.String a,
java.lang.String b,
java.lang.String c,
java.lang.String d)
Receive notification of an unparsed entity declaration event.
|
void |
warning(SAXParseException exception)
Receive notification of a warning.
|
public IncrementalSAXSource_Filter()
public IncrementalSAXSource_Filter(CoroutineManager co, int controllerCoroutineID)
public static IncrementalSAXSource createIncrementalSAXSource(CoroutineManager co, int controllerCoroutineID)
public void init(CoroutineManager co, int controllerCoroutineID, int sourceCoroutineID)
public void setXMLReader(XMLReader eventsource)
public void setContentHandler(ContentHandler handler)
IncrementalSAXSourcesetContentHandler in interface IncrementalSAXSourcepublic void setDTDHandler(DTDHandler handler)
IncrementalSAXSourcesetDTDHandler in interface IncrementalSAXSourcepublic void setLexicalHandler(LexicalHandler handler)
IncrementalSAXSourcesetLexicalHandler in interface IncrementalSAXSourcepublic void setErrHandler(ErrorHandler handler)
public void setReturnFrequency(int events)
public void characters(char[] ch,
int start,
int length)
throws SAXException
ContentHandlerThe Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity so that the Locator provides useful information.
The application must not attempt to read from the array outside of the specified range.
Individual characters may consist of more than one Java
char value. There are two important cases where this
happens, because characters can't be represented in just sixteen bits.
In one case, characters are represented in a Surrogate Pair,
using two special Unicode values. Such characters are in the so-called
"Astral Planes", with a code point above U+FFFF. A second case involves
composite characters, such as a base character combining with one or
more accent characters.
Your code should not assume that algorithms using
char-at-a-time idioms will be working in character
units; in some cases they will split characters. This is relevant
wherever XML permits arbitrary characters, such as attribute values,
processing instruction data, and comments as well as in data reported
from this method. It's also generally relevant whenever Java code
manipulates internationalized text; the issue isn't unique to XML.
Note that some parsers will report whitespace in element
content using the ignorableWhitespace
method rather than this one (validating parsers must
do so).
characters in interface ContentHandlerch - the characters from the XML documentstart - the start position in the arraylength - the number of characters to read from the arraySAXException - any SAX exception, possibly
wrapping another exceptionContentHandler.ignorableWhitespace(char[], int, int),
Locatorpublic void endDocument()
throws SAXException
ContentHandlerThere is an apparent contradiction between the
documentation for this method and the documentation for ErrorHandler.fatalError(org.xml.sax.SAXParseException). Until this ambiguity is
resolved in a future major release, clients should make no
assumptions about whether endDocument() will or will not be
invoked when the parser has reported a fatalError() or thrown
an exception.
The SAX parser will invoke this method only once, and it will be the last method invoked during the parse. The parser shall not invoke this method until it has either abandoned parsing (because of an unrecoverable error) or reached the end of input.
endDocument in interface ContentHandlerSAXException - any SAX exception, possibly
wrapping another exceptionContentHandler.startDocument()public void endElement(java.lang.String namespaceURI,
java.lang.String localName,
java.lang.String qName)
throws SAXException
ContentHandlerThe SAX parser will invoke this method at the end of every
element in the XML document; there will be a corresponding
startElement event for every endElement
event (even when the element is empty).
For information on the names, see startElement.
endElement in interface ContentHandlernamespaceURI - the Namespace URI, or the empty string if the
element has no Namespace URI or if Namespace
processing is not being performedlocalName - the local name (without prefix), or the
empty string if Namespace processing is not being
performedqName - the qualified XML name (with prefix), or the
empty string if qualified names are not availableSAXException - any SAX exception, possibly
wrapping another exceptionpublic void endPrefixMapping(java.lang.String prefix)
throws SAXException
ContentHandlerSee startPrefixMapping for
details. These events will always occur immediately after the
corresponding endElement event, but the order of
endPrefixMapping events is not otherwise
guaranteed.
endPrefixMapping in interface ContentHandlerprefix - the prefix that was being mapped.
This is the empty string when a default mapping scope ends.SAXException - the client may throw
an exception during processingContentHandler.startPrefixMapping(java.lang.String, java.lang.String),
ContentHandler.endElement(java.lang.String, java.lang.String, java.lang.String)public void ignorableWhitespace(char[] ch,
int start,
int length)
throws SAXException
ContentHandlerValidating Parsers must use this method to report each chunk of whitespace in element content (see the W3C XML 1.0 recommendation, section 2.10): non-validating parsers may also use this method if they are capable of parsing and using content models.
SAX parsers may return all contiguous whitespace in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity, so that the Locator provides useful information.
The application must not attempt to read from the array outside of the specified range.
ignorableWhitespace in interface ContentHandlerch - the characters from the XML documentstart - the start position in the arraylength - the number of characters to read from the arraySAXException - any SAX exception, possibly
wrapping another exceptionContentHandler.characters(char[], int, int)public void processingInstruction(java.lang.String target,
java.lang.String data)
throws SAXException
ContentHandlerThe Parser will invoke this method once for each processing instruction found: note that processing instructions may occur before or after the main document element.
A SAX parser must never report an XML declaration (XML 1.0, section 2.8) or a text declaration (XML 1.0, section 4.3.1) using this method.
Like characters(), processing instruction
data may have characters that need more than one char
value.
processingInstruction in interface ContentHandlertarget - the processing instruction targetdata - the processing instruction data, or null if
none was supplied. The data does not include any
whitespace separating it from the targetSAXException - any SAX exception, possibly
wrapping another exceptionpublic void setDocumentLocator(Locator locator)
ContentHandlerSAX parsers are strongly encouraged (though not absolutely required) to supply a locator: if it does so, it must supply the locator to the application by invoking this method before invoking any of the other methods in the ContentHandler interface.
The locator allows the application to determine the end position of any document-related event, even if the parser is not reporting an error. Typically, the application will use this information for reporting its own errors (such as character content that does not match an application's business rules). The information returned by the locator is probably not sufficient for use with a search engine.
Note that the locator will return correct information only
during the invocation SAX event callbacks after
startDocument returns and before
endDocument is called. The
application should not attempt to use it at any other time.
setDocumentLocator in interface ContentHandlerlocator - an object that can return the location of
any SAX document eventLocatorpublic void skippedEntity(java.lang.String name)
throws SAXException
ContentHandlerThe Parser will invoke this method each time the entity is
skipped. Non-validating processors may skip entities if they
have not seen the declarations (because, for example, the
entity was declared in an external DTD subset). All processors
may skip external entities, depending on the values of the
http://xml.org/sax/features/external-general-entities
and the
http://xml.org/sax/features/external-parameter-entities
properties.
skippedEntity in interface ContentHandlername - the name of the skipped entity. If it is a
parameter entity, the name will begin with '%', and if
it is the external DTD subset, it will be the string
"[dtd]"SAXException - any SAX exception, possibly
wrapping another exceptionpublic void startDocument()
throws SAXException
ContentHandlerThe SAX parser will invoke this method only once, before any
other event callbacks (except for setDocumentLocator).
startDocument in interface ContentHandlerSAXException - any SAX exception, possibly
wrapping another exceptionContentHandler.endDocument()public void startElement(java.lang.String namespaceURI,
java.lang.String localName,
java.lang.String qName,
Attributes atts)
throws SAXException
ContentHandlerThe Parser will invoke this method at the beginning of every
element in the XML document; there will be a corresponding
endElement event for every startElement event
(even when the element is empty). All of the element's content will be
reported, in order, before the corresponding endElement
event.
This event allows up to three name components for each element:
Any or all of these may be provided, depending on the values of the http://xml.org/sax/features/namespaces and the http://xml.org/sax/features/namespace-prefixes properties:
Note that the attribute list provided will contain only
attributes with explicit values (specified or defaulted):
#IMPLIED attributes will be omitted. The attribute list
will contain attributes used for Namespace declarations
(xmlns* attributes) only if the
http://xml.org/sax/features/namespace-prefixes
property is true (it is false by default, and support for a
true value is optional).
Like characters(), attribute values may have
characters that need more than one char value.
startElement in interface ContentHandlernamespaceURI - the Namespace URI, or the empty string if the
element has no Namespace URI or if Namespace
processing is not being performedlocalName - the local name (without prefix), or the
empty string if Namespace processing is not being
performedqName - the qualified name (with prefix), or the
empty string if qualified names are not availableatts - the attributes attached to the element. If
there are no attributes, it shall be an empty
Attributes object. The value of this object after
startElement returns is undefinedSAXException - any SAX exception, possibly
wrapping another exceptionContentHandler.endElement(java.lang.String, java.lang.String, java.lang.String),
Attributes,
AttributesImplpublic void startPrefixMapping(java.lang.String prefix,
java.lang.String uri)
throws SAXException
ContentHandlerThe information from this event is not necessary for
normal Namespace processing: the SAX XML reader will
automatically replace prefixes for element and attribute
names when the http://xml.org/sax/features/namespaces
feature is true (the default).
There are cases, however, when applications need to use prefixes in character data or in attribute values, where they cannot safely be expanded automatically; the start/endPrefixMapping event supplies the information to the application to expand prefixes in those contexts itself, if necessary.
Note that start/endPrefixMapping events are not
guaranteed to be properly nested relative to each other:
all startPrefixMapping events will occur immediately before the
corresponding startElement event,
and all endPrefixMapping
events will occur immediately after the corresponding
endElement event,
but their order is not otherwise
guaranteed.
There should never be start/endPrefixMapping events for the "xml" prefix, since it is predeclared and immutable.
startPrefixMapping in interface ContentHandlerprefix - the Namespace prefix being declared.
An empty string is used for the default element namespace,
which has no prefix.uri - the Namespace URI the prefix is mapped toSAXException - the client may throw
an exception during processingContentHandler.endPrefixMapping(java.lang.String),
ContentHandler.startElement(java.lang.String, java.lang.String, java.lang.String, org.xml.sax.Attributes)public void comment(char[] ch,
int start,
int length)
throws SAXException
LexicalHandlerThis callback will be used for comments inside or outside the document element, including comments in the external DTD subset (if read). Comments in the DTD must be properly nested inside start/endDTD and start/endEntity events (if used).
comment in interface LexicalHandlerch - An array holding the characters in the comment.start - The starting position in the array.length - The number of characters to use from the array.SAXException - The application may raise an exception.public void endCDATA()
throws SAXException
LexicalHandlerendCDATA in interface LexicalHandlerSAXException - The application may raise an exception.LexicalHandler.startCDATA()public void endDTD()
throws SAXException
LexicalHandlerThis method is intended to report the end of the DOCTYPE declaration; if the document has no DOCTYPE declaration, this method will not be invoked.
endDTD in interface LexicalHandlerSAXException - The application may raise an exception.LexicalHandler.startDTD(java.lang.String, java.lang.String, java.lang.String)public void endEntity(java.lang.String name)
throws SAXException
LexicalHandlerendEntity in interface LexicalHandlername - The name of the entity that is ending.SAXException - The application may raise an exception.LexicalHandler.startEntity(java.lang.String)public void startCDATA()
throws SAXException
LexicalHandlerThe contents of the CDATA section will be reported through
the regular characters event; this event is intended only to report
the boundary.
startCDATA in interface LexicalHandlerSAXException - The application may raise an exception.LexicalHandler.endCDATA()public void startDTD(java.lang.String name,
java.lang.String publicId,
java.lang.String systemId)
throws SAXException
LexicalHandlerThis method is intended to report the beginning of the DOCTYPE declaration; if the document has no DOCTYPE declaration, this method will not be invoked.
All declarations reported through
DTDHandler or
DeclHandler events must appear
between the startDTD and endDTD events.
Declarations are assumed to belong to the internal DTD subset
unless they appear between startEntity
and endEntity events. Comments and
processing instructions from the DTD should also be reported
between the startDTD and endDTD events, in their original
order of (logical) occurrence; they are not required to
appear in their correct locations relative to DTDHandler
or DeclHandler events, however.
Note that the start/endDTD events will appear within
the start/endDocument events from ContentHandler and
before the first
startElement
event.
startDTD in interface LexicalHandlername - The document type name.publicId - The declared public identifier for the
external DTD subset, or null if none was declared.systemId - The declared system identifier for the
external DTD subset, or null if none was declared.
(Note that this is not resolved against the document
base URI.)SAXException - The application may raise an
exception.LexicalHandler.endDTD(),
LexicalHandler.startEntity(java.lang.String)public void startEntity(java.lang.String name)
throws SAXException
LexicalHandlerThe reporting of parameter entities (including
the external DTD subset) is optional, and SAX2 drivers that
report LexicalHandler events may not implement it; you can use the
http://xml.org/sax/features/lexical-handler/parameter-entities
feature to query or control the reporting of parameter entities.
General entities are reported with their regular names, parameter entities have '%' prepended to their names, and the external DTD subset has the pseudo-entity name "[dtd]".
When a SAX2 driver is providing these events, all other
events must be properly nested within start/end entity
events. There is no additional requirement that events from
DeclHandler or
DTDHandler be properly ordered.
Note that skipped entities will be reported through the
skippedEntity
event, which is part of the ContentHandler interface.
Because of the streaming event model that SAX uses, some entity boundaries cannot be reported under any circumstances:
These will be silently expanded, with no indication of where the original entity boundaries were.
Note also that the boundaries of character references (which are not really entities anyway) are not reported.
All start/endEntity events must be properly nested.
startEntity in interface LexicalHandlername - The name of the entity. If it is a parameter
entity, the name will begin with '%', and if it is the
external DTD subset, it will be "[dtd]".SAXException - The application may raise an exception.LexicalHandler.endEntity(java.lang.String),
DeclHandler.internalEntityDecl(java.lang.String, java.lang.String),
DeclHandler.externalEntityDecl(java.lang.String, java.lang.String, java.lang.String)public void notationDecl(java.lang.String a,
java.lang.String b,
java.lang.String c)
throws SAXException
DTDHandlerIt is up to the application to record the notation for later reference, if necessary; notations may appear as attribute values and in unparsed entity declarations, and are sometime used with processing instruction target names.
At least one of publicId and systemId must be non-null. If a system identifier is present, and it is a URL, the SAX parser must resolve it fully before passing it to the application through this event.
There is no guarantee that the notation declaration will be reported before any unparsed entities that use it.
notationDecl in interface DTDHandlera - The notation name.b - The notation's public identifier, or null if
none was given.c - The notation's system identifier, or null if
none was given.SAXException - Any SAX exception, possibly
wrapping another exception.DTDHandler.unparsedEntityDecl(java.lang.String, java.lang.String, java.lang.String, java.lang.String),
Attributespublic void unparsedEntityDecl(java.lang.String a,
java.lang.String b,
java.lang.String c,
java.lang.String d)
throws SAXException
DTDHandlerNote that the notation name corresponds to a notation
reported by the notationDecl event.
It is up to the application to record the entity for later
reference, if necessary;
unparsed entities may appear as attribute values.
If the system identifier is a URL, the parser must resolve it fully before passing it to the application.
unparsedEntityDecl in interface DTDHandlera - The unparsed entity's name.b - The entity's public identifier, or null if none
was given.c - The entity's system identifier.d - The name of the associated notation.SAXException - Any SAX exception, possibly
wrapping another exception.DTDHandler.notationDecl(java.lang.String, java.lang.String, java.lang.String),
Attributespublic void error(SAXParseException exception) throws SAXException
ErrorHandlerThis corresponds to the definition of "error" in section 1.2 of the W3C XML 1.0 Recommendation. For example, a validating parser would use this callback to report the violation of a validity constraint. The default behaviour is to take no action.
The SAX parser must continue to provide normal parsing events after invoking this method: it should still be possible for the application to process the document through to the end. If the application cannot do so, then the parser should report a fatal error even if the XML recommendation does not require it to do so.
Filters may use this method to report other, non-XML errors as well.
error in interface ErrorHandlerexception - The error information encapsulated in a
SAX parse exception.SAXException - Any SAX exception, possibly
wrapping another exception.SAXParseExceptionpublic void fatalError(SAXParseException exception) throws SAXException
ErrorHandlerThere is an apparent contradiction between the
documentation for this method and the documentation for ContentHandler.endDocument(). Until this ambiguity
is resolved in a future major release, clients should make no
assumptions about whether endDocument() will or will not be
invoked when the parser has reported a fatalError() or thrown
an exception.
This corresponds to the definition of "fatal error" in section 1.2 of the W3C XML 1.0 Recommendation. For example, a parser would use this callback to report the violation of a well-formedness constraint.
The application must assume that the document is unusable after the parser has invoked this method, and should continue (if at all) only for the sake of collecting additional error messages: in fact, SAX parsers are free to stop reporting any other events once this method has been invoked.
fatalError in interface ErrorHandlerexception - The error information encapsulated in a
SAX parse exception.SAXException - Any SAX exception, possibly
wrapping another exception.SAXParseExceptionpublic void warning(SAXParseException exception) throws SAXException
ErrorHandlerSAX parsers will use this method to report conditions that are not errors or fatal errors as defined by the XML recommendation. The default behaviour is to take no action.
The SAX parser must continue to provide normal parsing events after invoking this method: it should still be possible for the application to process the document through to the end.
Filters may use this method to report other, non-XML warnings as well.
warning in interface ErrorHandlerexception - The warning information encapsulated in a
SAX parse exception.SAXException - Any SAX exception, possibly
wrapping another exception.SAXParseExceptionpublic int getSourceCoroutineID()
public int getControllerCoroutineID()
public CoroutineManager getCoroutineManager()
public void startParse(InputSource source) throws SAXException
startParse in interface IncrementalSAXSourceSAXException - is parse thread is already in progress
or parsing can not be started.public void run()
run in interface java.lang.Runnablepublic java.lang.Object deliverMoreNodes(boolean parsemore)
deliverMoreNodes in interface IncrementalSAXSourceparsemore - If true, tells the incremental filter to generate
another chunk of output. If false, tells the filter that we're
satisfied and it can terminate parsing of this document.Copyright © 2014 Apache XML Project. All Rights Reserved.