Web hosting directory, find affordable web hosting
Data recovery software tools & file recovery utilities to recover lost data
WestNIC provides reliable web hosting services
SGD Networks offers Web Hosting, Web Hosting Hyderabad
Ahosting.biz reseller hosting, managed dedicated server with 24/7 support
JTidy is a Java implementation of Dave Raggett's HTML Tidy program that "tidies" tag soup to produce well formed HTML. This article highlights some important integration issues for JTidy with notes and tips from the Code Style site log.
The JTidy how to page gives a simple example to process HTML from a Java InputStream to an OutputStream using using one of several parse methods on the core JTidy class.
The standard JTidy distribution contains three Java packages as source and compiled classes in %jtidy%/build/Tidy.jar:
org.w3c.dom
org.xml.sax
org.w3c.tidy package
The JTidy package was developed before the W3C DOM and SAX packages were widely available through the JAXP package and Java Software Development Kit 1.4. Small implementation differences with the DOM and SAX packages in these standard Java distributions can cause class compatibility conflicts with those included in the distribution JAR for JTidy. To overcome any class compatibility conflicts, download a recent source snapshot and re-package the org.w3c.tidy directory tree by itself; compile, copy important configuration files and create your own JAR file.
The JTidy package class Configuration uses the variable name enum, which is a keyword in Java 1.5 and not permitted. If necessary, use the -source 1.4 compiler flag to compile without source code modification.
C:\dev\>D:\java\jdk150\bin\javac
-source 1.4
-classpath "c:\dev"
C:\dev\org\w3c\tidy\*.java
-d "c:\dev\classes"
An important set of Java property files should also be copied into the JTidy class directory before you make a JAR file. The TidyMessages.properties file is the key English language message configuration file, TidyMessages_de.properties and TidyMessages_es.properties are not critical.
copy "C:\dev\org\w3c\tidy\*.properties" "C:\dev\classes\org\w3c\tidy"
Change to the classes output directory and archive the contents of the org subdirectory.
cd C:\dev\classes
C:\dev\classes>D:\java\jdk150\bin\jar cf "C:\dev\JTidy.jar" org
The MKSearch Ant build file includes a target that compiles and packages the org.w3c.tidy package in isolation. The general build.properties and local.properties files define the build variables, the local properties file should be updated with your own file system configuration. The JTidy source is expected to be in a directory named %mksearch%/lib-src/jtidy.
<target description="Compile and archive JTidy from source" name="jar.jtidy" depends="prepare"> <mkdir dir="${buildDir}/jtidy"/> <javac srcdir="${sourceLibDir}/jtidy" destdir="${buildDir}/jtidy" debug="${debug}" deprecation="off" optimize="${optimize}" verbose="${verbose}" source="${source.version}"> <classpath refid="classpath"/> <include name="**/*.java"/> </javac> <copy file="${sourceLibDir}/jtidy/org/w3c/tidy/TidyMessages.properties" todir="${buildDir}/jtidy/org/w3c/tidy" preservelastmodified="true" overwrite="true" verbose="false"/> <jar jarfile="${libDir}/jtidy.jar" basedir="${buildDir}/jtidy"/> </target>
The site log entries below note various developments to do with JTidy, which is used as a component in the Metacentric Web Feed Generator system and MKSearch metadata search engine. In both systems, JTidy is used to clean up the HTML source and convert to XHTML so that it can be processed further as XML using XSLT and SAX call-backs.
setFpi(String) method enables you to set your own document type declaration and validate with a custom document type definition. Code Style uses a "Lax" version of the XHTML transitional document type with many non-standard elements and attributes and less rigorous validation of attribute values.
setConfigurationFromFile(String) method. The string argument is the path to the configuration file.
The MKSearch project was developed with a view to run on free open source platforms, as well as the Sun Java platform. The project made a number of JTidy library modifications to work around known problems with the GNU compiler for Java (GCJ).