How to Get Related Posts by Similar Tags in SQL

Here's some SQL code to find Related Blog Posts by common Tags using SQL.
by Updated August 25, 2012
If you've ever wanted to get Related Posts based on similar Tags from other Posts using SQL, the following SQL code should help with creating your own SQL query.  In this example, the Related Posts are articles, but they could be Blog posts or any other data that uses Tags.  The SQL query below gets the count of Posts that have the most amount of Tags in common and orders these Related Posts highest.  So in essence, the Related Posts returned should be more highly related based on the number of tags that the Posts have in common.   SEE BELOW for the SQL Table Structure of the Tags.

Basic code to query the number of Common Tags an Article has with other Related Articles (Ordered by Common Tag):
SELECT tpe1.ArticleId, tpe2.ArticleId, COUNT(*) as CommonTags
FROM dbo.article_TagPostEntry tpe1 
INNER join dbo.article_TagPostEntry tpe2 ON tpe1.TagId = tpe2.TagId
WHERE  tpe1.ArticleId <> tpe2.ArticleId
GROUP BY tpe1.ArticleId, tpe2.ArticleId
ORDER BY CommonTags DESC;


Here's the SQL code to query Related Article Posts based on similar Tags:
DECLARE @ArticleId INT
SET @ArticleId = 3 -- TEST NUMBER

-- Get the Related Articles based on similiar Tags
			SELECT TOP 5 a.Title, a.Description FROM dbo.article_Articles a 
			WHERE a.ArticleId IN (
				SELECT TOP 8 tpe2.ArticleId
				FROM dbo.article_TagPostEntry tpe1 
				INNER join dbo.article_TagPostEntry tpe2 ON tpe1.TagId = tpe2.TagId
				WHERE  tpe1.ArticleId != tpe2.ArticleId AND tpe1.ArticleId = @ArticleId
				GROUP BY tpe2.ArticleId 
				ORDER BY NEWID())
			



Here's another way to query Related Article Posts based on similar Tags (THIS METHOD WILL PROBABLY be SLOWER, but you could modify the code to query TAGS from seperate tables that may not be related by IDs):
DECLARE @ArticleId INT
SET @ArticleId = 3 -- TEST NUMBER

-- Get the Related Articles based on similiar Tags
			SELECT TOP 5 a.Title, a.Description FROM dbo.article_Articles a 
			WHERE a.ArticleId IN (
				SELECT TOP 5 ArticleId
				FROM dbo.article_TagPostEntry 
				WHERE ArticleId <> @ArticleId AND TagId IN (SELECT tpe.TagId FROM dbo.article_TagPostEntry tpe WHERE tpe.ArticleId = @ArticleId) 
				GROUP BY  ArticleId 
				ORDER BY COUNT(TagId) DESC)
				ORDER BY NEWID()

Alternatively, to get a more randomized listing of related Posts, replace ORDER BY COUNT(TagID) DESC wtih:
ORDER BY NEWID(), COUNT(TagId) DESC


----------------------

Keep in mind the Table Structure for Tags uses 2 tables:

Table #1 is the "article_TagPostEntry" table and looks like this:
TABLE [dbo].[article_TagPostEntry](
	[PostTagId] [bigint] IDENTITY(1,1) NOT NULL,
	[ArticleId] [int] NOT NULL,
	[TagId] [bigint] NOT NULL,
	[DateCreated] [datetime] NOT NULL,

Table #2 is the "article_TagNames" table and is where the Names of the Tags are actually kept.  TagId is used to relate the two Tag tables together.  The Related Posts SQL code in the example above doesn't need to access this table.
 
TABLE [dbo].[article_TagNames](
	[TagId] [bigint] IDENTITY(1,1) NOT NULL,
	[TagName] [nvarchar](50) NOT NULL,
	[TagPathName] [nvarchar](100) NOT NULL,
	[TagCount] [int] NOT NULL,
	[DateCreated] [date] NOT NULL,

NOTE: The main Articles table gets related to the "article_TagPostEntry" table using ArticleId.


JUST Keep in mind that these queries will put a serious strain on your database if you run them dynamically everytime someone visits a page.  You'd probably be better off inserting the Related Posts that this query outputs into a Table, and retrieving the data by ID's.   Otherwise, you may find you get a lot of  SQL Timeout errors if your database server is not powerful enough to handle the load on a high traffic site.  You don't want to deal with the following error:
 "Timeout expired. The timeout period elapsed prior to completion of the
operation or the server is not responding"
 


2
0

Add your comment

by Anonymous - Already have an account? Login now!
Your Name:  

Comment:  
Enter the text you see in the image below
What do you see?
Can't read the image? View a new one.
Your comment will appear after being approved.

Related Posts


Here's a quick SQL tip on how to get records beginning with numbers only. Use the query string LIKE '[0-9]%' For example: SELECT u.UserId, u.UserName FROM dbo.aspnet_Users u WHERE u.UserName LIKE '[0-9]%' ORDER BY u.UserName Keep in mind, If you are using...  more »

I recently ran into an issue with the ELMAH Error logger Table getting way TOO Big in my SQL Server database. The table row size in the ELMAH_Error table had over 200,000 rows. This was way more than I ever needed to look at, and was causing other issues...  more »

Facebook allows you to tag your friends in your status updates using the @ symbol. The tagging feature in facebook is similiar to that of Twitter, but you don't need to know your friends "@username" to use the tag. Instead follow these instructions to tag...  more »

Here's a step by step process to enable a remote connection to your SQL Server 2008 Express database that is configured on Windows Server 2008 using an assigned port #. 1) First make sure the SQL Server Authentication is enabled and the User name that...  more »

When using SQL Server 2008, you may receive a Save (Not Permitted Dialog) box pop-up when trying to save changes to an existing table. The dailog box reads: "Saving changes is not permitted. The changes you have made require the following tables to be...  more »

In SQL Server 2008 Management Studio, you may want to change the default settings to allow you to edit more than the 200 rows at a time, or select more than 1000 rows, which are the default settings for SQL Server. To modify "Edit Top 200 Rows" or "Select...  more »

There's a good chance that if you're database driven application is running into the following sql error message that says "String or binary data would be truncated", that the error is being caused by an issue in a SQL statement or in the SQL code of a...  more »

Here's a couple quick SQL database tips for finding and deleting duplicate values in a SQL Server table.To find duplicates in a Column use the following SQL: SELECT ColName1, COUNT(*) TotalCountFROM TableName GROUP BY ColName1 HAVING (COUNT(ColName1) >...  more »