In this section

Site navigation below

The Code Style Web site has evolved by the gradual refinement and accumulation of articles and features. This review is part of a fully backdated site log and archive that may help shed light on when, why and how particular features were implemented. Many of these log entries refer to the Code Style Java package, which delivers the servlet services on this site.

Subscribe to the news feed for this log: RSS news feed

Reverse chronology

RewriteEngine 403 error solved, 24th November 2005

Took some time to investigate the problem found on 11th November, where a new RewriteEngine directive was causing a 403 forbidden error with the site feedback script, Soupermail. Ultimately discovered the diagnosis was logged in the Apache error log: "...Options FollowSymLinks or SymLinksIfOwnerMatch is off which implies that RewriteRule directive is forbidden: /path/to/script". The simple addition of the SymLinksIfOwnerMatch directive to the script directory's .htaccess configuration solved the problem.

Options -Indexes SymLinksIfOwnerMatch ExecCGI
      

Also added an escape slash to the dot in the regular expression for the ejupiter.com bot. The F command on the rewrite rule issues an HTTP 403 error as intended.

RewriteCond %{HTTP_USER_AGENT} ejupiter\.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NaverBot
RewriteRule .* - [F,PT]
      

Java threads and Latin 1 FAQs, 20th November 2005

Added several new questions to the Java language FAQ section and took the opportunity to re-group the questions, and those in the Java servlets FAQ section. Re-instated a getRequestDispatcher() question that was dropped in recent changes.

Java API

Java servlet API

Java strings

Java threads

Javascript "how to"

Retrospective RSS agents, 19th November 2005

Went back to process the Code Style server logs for 2005 through the RSS user agent analysis system and added several new aggregators. Divided the RSS user agent listing into separate pages for Web, browser and email based aggregators, desktop readers, readers for mobile devices and RSS tools.

New agents include Web based services Bommie, RSSfeed and Etamp, RSS tools MyHeadlines and RSS Reader Plugin, and FreeNews in the RSS readers for mobile devices category.

RSS user agent update with IE 7 beta, 14th November 2005

Made some final amendments to the RSS user agent database to de-duplicate some aggregator names and standardise the capitalisation and spacing on others. The final version adds three new categories to the classification: browser based readers, email based readers and readers for mobile devices. Updated the public RSS user agents page with the full listing, which lists dozens of new aggregators, including Internet Explorer 7.0 beta.

Bulk RSS agent analysis, 13th November 2005

Completed the working draft of the RssAgentLogger class and refined the SQL post-processing scripts. Processed all Code Style log files for 2004 and added many new aggregators to the master reference table, with amendments to existing names and URLs. Also refined the HTML output from the RssAgentLogger class to serve as a drop-in replacement for the current RSS user agents list.

RSS user agent analysis scripts, 12th November 2005

Refactored the draft RssAgentLogger class to create separate methods for processAgents and printAgents. The first loads the agent identifiers into a temporary database table via an Analyser instance, the latter prints out HTML formatted details of each aggregator after the data have been enhanced and classified with a series of SQL scripts.

The first SQL script identifies RSS agents from their identifiers and adds an agent name field and other details. The second script matches the agents with known aggregator names and URLs and updates the client table.

Retrospectively loaded all aggregator data from the RSS user agents listing and then all user agent data from the Metacentric service logs for the past 4 months to test.

Brought the working draft Analyser package up to coding standards and discarded the static parseClients method.

Apache RewriteEngine conflict with Soupermail, 11th November 2005

Discovered the recent addition of Apache rewrite rules to the root level .htaccess configuration had been causing HTTP 403, access forbidden, errors on the Soupermail feedback script. Not immediately obvious why this would cause a conflict, so simply removed the less critical rewrite rules for now.

Bad Googlebot behaviour, 8th November 2005

A number of indexing spiders have fallen into the trap set on 13th October. The path /badbot is prohibited by the robots.txt policy but Googlebot 2.1 fell in, with a number of other user agents:

ejupiter.com
Googlebot/2.1 (+http://www.google.com/bot.html)
OmniExplorer_Bot/4.32 (+http://www.omni-explorer.com) WorldIndexer
      

MKSearch beta 1 release, 2nd November 2005

MKSearch is a free, open source search engine that indexes structured metadata in Web documents, not free text in the document body. The data acquisition system conforms to the Dublin Core metadata in HTML recommendations; supports other application profiles, such as the UK e-Government Metadata Standard; and indexes native RDF formats, including RSS 1.0.

The MKSearch system has five major components:

  1. A Web crawler based on JSpider
    • Multi-threaded processing
    • Per-site throttle, user agent, depth and linking rules
    • Respects the robots.txt exclusion policy
    • Extensible plug-in based content handling
  2. An HTML document validator and formatter based on JTidy
    • Cleans-up and corrects HTML syntax errors
    • Converts HTML to XHTML
  3. A set of custom indexers based on the Simple API for XML (SAX)
    • Extracts metadata from HTML meta and link elements
    • Converts metadata to RDF triple statements
    • Configurable application profiles
  4. An RDF storage and query system based on Sesame
    • XML/RDF file-based storage
    • Database storage using PostgreSQL or MySQL
    • Sophisticated Sesame RDF Query Language (SeRQL) queries
    • Scope for more semantically rich queries with inferencing
  5. A public query interface, provided through a standard servlet container
    • Simple, expandable query builder form
    • Configurable application profile-based presentation
    • Wildcard query handling
    • Phrase searches
    • Paged HTML results
    • Standing RSS results

The two main elements of the MKSearch system can be used independently. The data acquisition system can be used to gather large quantities of metadata from the Web and store it as RDF. The query system can be used to provide a typical search engine-style interface to existing RDF content.

The MKSearch beta 1 distribution includes sample configurations that crawl a Web site and create:

This distribution also includes a demonstration of the MKSearch query interface, in the form of a Web Application Archive (WAR) that can be deployed directly to an existing servlet container. The sample search content is from an index of the MKSearch project Web site on 2 November 2005. See the site documentation below:

MKSearch is written in the Java programming language and is designed to run on any platform that supports a Java environment equivalent to the Sun Java 2 language specification.

The system has specifically been designed, developed and tested to run on GNU/Linux systems using the GNU Compiler for Java (GCJ) and Apache Tomcat 5 servlet container, as available on Fedora Core 4. This provision means that MKSearch can be built and run on software systems that are entirely open source and free from proprietary licensing.

The system has been tested extensively using the Sun Java SDK 1.5 on Microsoft Windows 2000. JUnit test suites for the MKSearch code base cover 99% of all code branches.

Previously on Code Style

These backdated pages record detailed changes to the Code Style Web site since July 2000, when development first got underway. Some pages may refer to documents or features that have since changed or are no longer part of the site, but the archive is checked to ensure there are no dead links.

Add this page to your chosen social bookmarking service

Style warning - please read

Home · CSS · Java · Javascript · HTML · Help · Log