Scott Boms

Our January 2005 Back Catalogue

Batch Encoding Text

Server-side parsing tools are fantastic and can save you enormous amounts of time and effort in producing large-scale websites. At the same time they tend to have their own bugs and intricacies which can confound and perplex the best of us. This was the case today.

Background and the Problem at Hand

We’re preparing to promote a number of changes, fixes, new features and general improvements tonight on the Masterfile.com site. In the testing of said features we’ve encountered the usual fare — bugs. The latest one being related to a small, but generally significant change we’ve been trying to get out the door for some time — moving the site completely to UTF-8.

In a nutshell, we encountered a problem where somewhere along the way, character encodings were getting mangled. As a result, text was not rendering properly and search links generated zero result queries. This is bad and obviously unacceptable.

The Solution

While perhaps not the most elegant solution, we found a little piece of JavaScript code which batch-translates the raw UTF-8 encoded pages to the equivalent HTML entities. It’s simple and not overly tedious. It kindly ignores the surrounding HTML code completely and only translates accented characters as well.

Here’s the code and a brief description of how to use it:


function convertToEntities() {
  var tstr = document.form.unicode.value;
  var bstr = '';
  for(i=0; i<tstr.length; i++) {
    if(tstr.charCodeAt(i)>127) {
      bstr += '&#' + tstr.charCodeAt(i) + ';';
    } else {
      bstr += tstr.charAt(i);
    }
  }
  document.form.entity.value = bstr;
}

Usage

Create a simple form in a new HTML page with two textarea fields and a submit button. The first will be the input, the second will display the output and should be set with the readonly attribute. The submit button should have an onclick attribute which calls the javascript function. Take a quick look at the sample page I’ve put together to see how it works.

Another option for Livesearch in Movable Type

Arvind at Movalog, along with some assistance created a nice alternate version of the Livesearch functionality that integrates with Movable Type and still allows the traditional CGI-style search function to work. While I haven’t had much time to investigate this in depth, it doesn’t strike me as quite the same thing as I’ve implemented and described, but is instead closer to the Google Suggest functionality or something in between.

Nevertheless, it’s very interesting and I’m going to look at how these two options might be combined to overcome the shortcomings I outlined in the earlier post. I’d really like to see the original search functionality remain available as a fall-back alternative for users in unsupported browsers or with Javascript disabled.

A Bit More On Livesearch And Movable Type

It’s been about two weeks since I posted the Livesearch functionality tutorial here and in the time I’ve been looking at the SQL query to see if there’s room for improvement.

I noticed that some of the results for certain queries didn’t make sense. What I realized I had forgotten was that it’s raw, Markdown formatted text stored in the database tables and the results reflect all positive matches to the string a user submits through the feature. This includes matching bits of URL strings…

In effect what I thought was wrong, wasn’t. It was a matter of false expectations of what the results should be, that they weren’t as good as they should be or that I had made a mistake with the query. The query correctly checks against the entry_text entry_title fields. Taking this a step further, we could add MySQL’s full-text search to make searching even more robust.

Another small catch to keep in mind is that searches are case-insensitive meaning that a search for “Apple” should yield the same results as “apple”.

The point I’m trying to make here is simply that you should look closely at the results to ensure that everything is working as you expect. Test a handful of queries using the command line or some other database management tool.

Other Examples Of Using XMLHTTPRequest

This XMLHTTPRequest thing is really starting to take off now that newer browsers are supporting the feature more consistently and that the web community has started to take notice.

As a result you’ll likely see the Livesearch functionality cropping up on more and more blogs/websites in the near future and in more innovative and creative ways. Two other excellent examples of the XMLHTTPRequest object can be found over at map.search.ch and of course Google Suggest.

Resetting entry_id Values For Movable Type

Since I slipped up yesterday and inadvertently posted an entry destined for my ‘Hits, No Misses…’ links blog on the main blog I encountered the problem of a small blip in the sequential numbering of my entry IDs. Thankfully this is fairly easy to correct and stuck me as a good little mini-tutorial for anyone who may not know how to do this.

Using my own situation as an example let’s say you just posted an entry to your blog and it is recorded in the database as mt_entry number 439. You then notice you posted it to the wrong blog. Oops.

MT Entries table Statistics
MT Entries table statistics in MySQL

To remedy the situation you re-post the entry to the correct blog (this is easy with ecto) which then leaves you with two entries. Next you delete the first entry (the one posted to the wrong blog) leaving the second entry with an ID of 440.

If you were to look at the mt_entry table for your Movable Type install using something like phpMyAdmin you’d notice the last two entries were now 438 and 440 and the next autoindex value would be 441. Clearly there’s a blip in the system, but it’s an easy fix.

Un-Blipping The System

To get things back to normal you’ll need to have access to your MySQL database through the command line or something like phpMyAdmin. Start by looking at the mt_entry table. Note the last entry ID (based on the example, this will be 440) and change the ID to the missing ID. In this case it would be 439.

Now that the IDs are back in sequential order there’s still the matter of the autoindex value which, if left alone, will result in the next new entry to have an ID of 441. In order to fix this, we have to reset the autoindex value so the next ID will be 440 instead of 441. To do this, simply execute this query on the me_entry table:

ALTER TABLE mt_entry AUTO_INCREMENT=1

You can now safely rebuild your indexes and archives to see the changes reflected in your site.

Alternatively, you can probably avoid at least part of this procedure by not double-posting the entry to two different blogs and simply change the entry_blog_id value and rebuild. Note to self for next time this happens…

« December 2004February 2005 »