I work for a women’s archive. This means that a great many of the people we write about have significant middle names. Unlike men, after all, women are often expected to give up their “maiden” names when they marry.
This leads to interesting search problems. In most cases, however, a search in Google (or on our website) for “Ruth Bader Ginsburg” or “Ruth Ginsburg” or “Ruth Bader” gets pretty much the same results.
But what happens when someone is unsure of the spelling of the middle name and tries to search for “Ruth B Ginsburg”? She doesn’t exist.
It turns out that this is a more generalized problem. I can vaguely see where the problem arises–there are infinite potential expansions on an initial–but it’s a big problem. You can see for yourself.
Search bing or google or the search engine of your choice for:
- Bill Gates
- Bill Henry Gates
- Bill H Gates
I’m willing to bet that I’m not the only person who didn’t know that searching on an initial letter only is fatal to most search results. This is of limited interest globally, but, to archives, and especially to archives like ours, we need a better default solution than “no results.”



January 25th, 2010 07:55
Ari,
Great point and one that usually falls under the radar of online searchers. One solution provided by the search software we use to provide web access to art history databases and some digital collections can automatically stem each term in the user’s search string. If you search “Bill H Gates”, you’ll retrieve on BILL AND (all words beginning with H) AND GATES. The application designer can even require the three terms to appear next to each other in the specified order. Much better recall and precision with this method than by treating the “H” as a search term in and of itself.
January 26th, 2010 09:14
One would think that the search engine would do similar to what you are suggesting, and that it would lead to more search results, but what seems to happen is sort of the opposite.
Note that this problem exists for names, but might be contraindicated for other terms.
What we are planning to do is to ensure that for everyone with a middle name (or more than two name components) the name is expressed in metadata with the initial, so that “Bill H Gates” is, in fact, an exact match.
January 26th, 2010 11:13
And an only half-cheesy, expedient, post-search UI workaround could be to return a special display header line above the results list for any queries that included a single alpha character, something like this:
‘These results were found for “ari x davidow”; you may find more by searching for: ari davidow.’
With “ari davidow” as a direct link to the revised query. You could get fancier by running the revised search before returning results for candidate queries, and displaying this text and link only if dropping the single character yields additional hits. You could also display it only when the single character preceded a space followed by a multi-character string (as a rough way of matching first or middle initials, but not “Malcolm X” or “Plan B”). Hmmm….