Class HtmlParser


  • public class HtmlParser
    extends java.lang.Object
    The HtmlParser class is an HTML DOM parser. This parser provides the functionality for the standard DOM parser implementation DocumentBuilderImpl. This parser class may be used directly when a different DOM implementation is preferred.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static java.lang.String MODIFYING_KEY
      A node UserData key used to tell nodes that their content may be about to be modified.
    • Constructor Summary

      Constructors 
      Constructor Description
      HtmlParser​(UserAgentContext ucontext, org.w3c.dom.Document document)
      Constructs a HtmlParser.
      HtmlParser​(UserAgentContext ucontext, org.w3c.dom.Document document, org.xml.sax.ErrorHandler errorHandler, java.lang.String publicId, java.lang.String systemId)
      Constructs a HtmlParser.
      HtmlParser​(org.w3c.dom.Document document, org.xml.sax.ErrorHandler errorHandler, java.lang.String publicId, java.lang.String systemId)
      Deprecated.
      UserAgentContext should be passed in constructor.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      static boolean isDecodeEntities​(java.lang.String elementName)  
      void parse​(java.io.InputStream in)
      Parses HTML from an input stream, assuming the character set is ISO-8859-1.
      void parse​(java.io.InputStream in, java.lang.String charset)
      Parses HTML from an input stream, using the given character set.
      void parse​(java.io.LineNumberReader reader)  
      void parse​(java.io.LineNumberReader reader, org.w3c.dom.Node parent)
      This method may be used when the DOM should be built under a given node, such as when innerHTML is used in Javascript.
      void parse​(java.io.Reader reader)
      Parses HTML given by a Reader.
      void parse​(java.io.Reader reader, org.w3c.dom.Node parent)
      This method may be used when the DOM should be built under a given node, such as when innerHTML is used in Javascript.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • MODIFYING_KEY

        public static final java.lang.String MODIFYING_KEY
        A node UserData key used to tell nodes that their content may be about to be modified. Elements could use this to temporarily suspend notifications. The value set will be either Boolean.TRUE or Boolean.FALSE.
        See Also:
        Constant Field Values
    • Constructor Detail

      • HtmlParser

        public HtmlParser​(org.w3c.dom.Document document,
                          org.xml.sax.ErrorHandler errorHandler,
                          java.lang.String publicId,
                          java.lang.String systemId)
        Deprecated.
        UserAgentContext should be passed in constructor.
        Constructs a HtmlParser.
        Parameters:
        document - A W3C Document instance.
        errorHandler - The error handler.
        publicId - The public ID of the document.
        systemId - The system ID of the document.
      • HtmlParser

        public HtmlParser​(UserAgentContext ucontext,
                          org.w3c.dom.Document document,
                          org.xml.sax.ErrorHandler errorHandler,
                          java.lang.String publicId,
                          java.lang.String systemId)
        Constructs a HtmlParser.
        Parameters:
        ucontext - The user agent context.
        document - An W3C Document instance.
        errorHandler - The error handler.
        publicId - The public ID of the document.
        systemId - The system ID of the document.
      • HtmlParser

        public HtmlParser​(UserAgentContext ucontext,
                          org.w3c.dom.Document document)
        Constructs a HtmlParser.
        Parameters:
        ucontext - The user agent context.
        document - A W3C Document instance.
    • Method Detail

      • isDecodeEntities

        public static boolean isDecodeEntities​(java.lang.String elementName)
      • parse

        public void parse​(java.io.InputStream in)
                   throws java.io.IOException,
                          org.xml.sax.SAXException,
                          java.io.UnsupportedEncodingException
        Parses HTML from an input stream, assuming the character set is ISO-8859-1.
        Parameters:
        in - The input stream.
        Throws:
        java.io.IOException - Thrown when there are errors reading the stream.
        org.xml.sax.SAXException - Thrown when there are parse errors.
        java.io.UnsupportedEncodingException
      • parse

        public void parse​(java.io.InputStream in,
                          java.lang.String charset)
                   throws java.io.IOException,
                          org.xml.sax.SAXException,
                          java.io.UnsupportedEncodingException
        Parses HTML from an input stream, using the given character set.
        Parameters:
        in - The input stream.
        charset - The character set.
        Throws:
        java.io.IOException - Thrown when there's an error reading from the stream.
        org.xml.sax.SAXException - Thrown when there is a parser error.
        java.io.UnsupportedEncodingException - Thrown if the character set is not supported.
      • parse

        public void parse​(java.io.Reader reader)
                   throws java.io.IOException,
                          org.xml.sax.SAXException
        Parses HTML given by a Reader. This method appends nodes to the document provided to the parser.
        Parameters:
        reader - An instance of Reader.
        Throws:
        java.io.IOException - Thrown if there are errors reading the input stream.
        org.xml.sax.SAXException - Thrown if there are parse errors.
      • parse

        public void parse​(java.io.LineNumberReader reader)
                   throws java.io.IOException,
                          org.xml.sax.SAXException
        Throws:
        java.io.IOException
        org.xml.sax.SAXException
      • parse

        public void parse​(java.io.Reader reader,
                          org.w3c.dom.Node parent)
                   throws java.io.IOException,
                          org.xml.sax.SAXException
        This method may be used when the DOM should be built under a given node, such as when innerHTML is used in Javascript.
        Parameters:
        reader - A document reader.
        parent - The root node for the parsed DOM.
        Throws:
        java.io.IOException
        org.xml.sax.SAXException
      • parse

        public void parse​(java.io.LineNumberReader reader,
                          org.w3c.dom.Node parent)
                   throws java.io.IOException,
                          org.xml.sax.SAXException
        This method may be used when the DOM should be built under a given node, such as when innerHTML is used in Javascript.
        Parameters:
        reader - A LineNumberReader for the document.
        parent - The root node for the parsed DOM.
        Throws:
        java.io.IOException
        org.xml.sax.SAXException