Designing systems that work.
Contact us to get started today.

Evaluating Meta Data for Search

When you deploy a search engine and index content that needs to be searched, there’s often a lot of information that could be pulled in as meta data – how do you evaluate it and decide what to keep? This post might help.

In a previous post I’ve talked about keeping meta data fields to a consolidated list (keeping a meta data dictionary). Here are some thoughts about how you can evaluate whether a particular piece of meta data belongs in that dictionary and should be brought into the search engine.

Can the data be used for filtering search result sets?

Good examples include, file type, location, author and similar items. Will it help the user reduce matching results over attributes about the content that they might remember? Clearly for a structured search – as with a product catalogue for example – a lot of the meta data would fall into this category. Items such as colour, price, and so on can all be used for guided search. If data falls into this category, include it. 

Is the data source system specific?

Examples include internal data items such as additional internal tracking ID’s (a search engine really only needs one unique ID for a document, and usually this is the URL). I’ve seen Microsoft Sharepoint systems indexed and meta data fields that are retrieved are numerous. Most of them are used by Sharepoint for purposes best known to Sharepoint – perhaps the data is used to track administration details, workflows applicable to the document, internal storage management details, and so on. Most of this, if not all, is not needed to enable effective searching. Data in this category should be left out of the search engine. 

Will the data enhance presentation?

Good examples for this type of meta data include items such as file size, source system, and so on. Even the URL of the document falls into this group. This type of data will help present a result and help people determine if this might be the content they’re looking for. In a product catalogue scenario this would include a product picture, a product description, and so on. If the data falls into this category, bring it in to the search engine (remember that pictures are best stored elsewhere and a URL or other reference stored in the search engine). 

Could the data be service enhancing?

Pulling data from this category allows other search services to be more effective or offer useful data. An example might be a keywords field. This would enhance keyword search matching. If there’s a user product rating in a search catalogue, this would enable some kind of enhanced reporting service for marketing or purchasing staff. Think about some of the information services that the search engine could power, or think about how the services already offered could provide more, insightful data. The search engine can pull together products, documents, or whatever else it has indexed into arbitrary lists based on keyword or meta data matches – this is a powerful capability and something that’s likely hard to do across most of the source systems being indexed by the search engine.

So remember this when picking meta data – think of the value add.


Are there any other rules you use when selecting meta data? Are there way’s in which you’ve used your search system to deliver value add services with good meta data? Any thoughts, as always, would be appreciated.

 

Leave a Reply