How to Code a Search Helper Class to Clean Stop Words with C#

Remove stopwords in text with C#.
by Updated August 10, 2010

Today I decided to implement a StopWords filter in C# that would filter out certain woulds from a search engine query.  I wanted something to filter out common words like "a", "I", "to", "the" "how", from search queries since in most cases these words don't really help with getting the most accurrate search results from a query, and instead they just create more unnecessary search results.  

Keep in mind, there's not an end all be all list of stop words to use in all cases because ultimately you have to decide for yourself what's best for the application (and it's users) when determining what stopwords to include and what to exclude.  However, below are a couple resources to find common stop words lists that you may want to use to create your own StopWords list: 

I ultimately narrowed my StopWords list down to some of the more common words, that I felt wouldn't interfere too much with a searcher's intent:

"a", "about", "actually", "after", "also", "am", "an", "and", "any", "are", "as", "at", "be", "because", "but", "by", 
"could", "do", "each", "either", "en", "for", "from", "has", "have", "how",  "i", "if", "in", "is", "it", "its", "just", "of", "or", "so", "some", "such", "that", "the", "their", "these", "thing", "this", "to", "too", "very", "was", "we", "well", "what",        "when", "where",  "who", "will", "with", "you", "your"

Once I figured out my StopWords list I created a SearchHelper class in C# to clean search query Words before sending them to the database to return search results.  Below is the SearchHelper.cs C# class (download available: see attached .cs file below):

using System;
using System.Collections.Generic;
using System.Collections.Specialized;
using System.Text;

public class SearchHelper
{
    private static string[] stopWordsArrary = new string[] { "a", "about", "actually", "after", "also", "am", "an", "and", "any", "are", "as", "at", "be", "because", "but", "by", 
                                                "could", "do", "each", "either", "en", "for", "from", "has", "have", "how", 
                                                "i", "if", "in", "is", "it", "its", "just", "of", "or", "so", "some", "such", "that", 
                                                "the", "their", "these", "thing", "this", "to", "too", "very", "was", "we", "well", "what", "when", "where",
                                                "who", "will", "with", "you", "your" 
                                            };

        /// 
		/// Removes stop words from the specified search string.
		/// 
		public static string CleanSearchedWords(string searchedWords)
		{

			searchedWords = searchedWords
											.Replace("\\", string.Empty)
											.Replace("|", string.Empty)
											.Replace("(", string.Empty)
											.Replace(")", string.Empty)
											.Replace("[", string.Empty)
											.Replace("]", string.Empty)
											.Replace("*", string.Empty)
											.Replace("?", string.Empty)
											.Replace("}", string.Empty)
											.Replace("{", string.Empty)
											.Replace("^", string.Empty)
											.Replace("+", string.Empty);

            // transform search string into array of words
            char[] wordSeparators = new char[] { ' ', '\n', '\r', ',', ';', '.', '!', '?', '-', ' ', '"', '\'' };
            string[] words = searchedWords.Split(wordSeparators, StringSplitOptions.RemoveEmptyEntries);

            // Create and initializes a new StringCollection.
             StringCollection myStopWordsCol = new StringCollection();
            // Add a range of elements from an array to the end of the StringCollection.
             myStopWordsCol.AddRange(stopWordsArrary);

			StringBuilder sb = new StringBuilder();
			for (int i = 0; i < words.Length; i++)
			{
				string word = words[i].ToLowerInvariant().Trim();
                if (word.Length > 1 && !myStopWordsCol.Contains(word))
					sb.Append(word + " ");
			}

			return sb.ToString();
		}
}

That's it...   Now on your search results page code, you can use the SearchHelper.CleanSearchWords(searchWordsHere)  to clean the searched words string.  Pretty simple, but works well for filtering out common words from a search query.

 


FILES: SearchHelper.cs - Clean Stop Words in C#

0
0

Add your comment

by Anonymous - Already have an account? Login now!
Your Name:  

Comment:  
Enter the text you see in the image below
What do you see?
Can't read the image? View a new one.
Your comment will appear after being approved.

Related Posts


Here's how you can UrlEncode the plus sign (+) in a URL querystring in ASP.NET and then retrieve the plus symbol after UrlDecoding the string. In this example, I will do a postback and redirect the Server.UrlEncoded string to another page. First we will...  more »

If you do any sort of ASP.NET programming there usually comes a time when you need to get a websites Base URL. The following shows two examples, the first example shows how to get the Base Site Url using C#, which can be used for getting both the...  more »

Adding a CSS border to an ASP.NET Image control was a mystery to me for the longest time. While you could easily use an html image and add the runat="server" to it and then add CSS, I really wanted to use an asp:Image control along with a CSS border....  more »

So below I'm going to share with you a fairly easy to use and understand ASP.NET User Control that allows you to pick a Date (with the ajaxToolkit CalenderExtender) and also select the Time of day using a drop down list. I've named the control...  more »