How to Get Related Posts by Similar Tags in SQL

Here's some SQL code to find Related Blog Posts by common Tags using SQL.
by Updated August 25, 2012
If you've ever wanted to get Related Posts based on similar Tags from other Posts using SQL, the following SQL code should help with creating your own SQL query.  In this example, the Related Posts are articles, but they could be Blog posts or any other data that uses Tags.  The SQL query below gets the count of Posts that have the most amount of Tags in common and orders these Related Posts highest.  So in essence, the Related Posts returned should be more highly related based on the number of tags that the Posts have in common.   SEE BELOW for the SQL Table Structure of the Tags.

Basic code to query the number of Common Tags an Article has with other Related Articles (Ordered by Common Tag):
SELECT tpe1.ArticleId, tpe2.ArticleId, COUNT(*) as CommonTags
FROM dbo.article_TagPostEntry tpe1 
INNER join dbo.article_TagPostEntry tpe2 ON tpe1.TagId = tpe2.TagId
WHERE  tpe1.ArticleId <> tpe2.ArticleId
GROUP BY tpe1.ArticleId, tpe2.ArticleId
ORDER BY CommonTags DESC;


Here's the SQL code to query Related Article Posts based on similar Tags:
DECLARE @ArticleId INT
SET @ArticleId = 3 -- TEST NUMBER

-- Get the Related Articles based on similiar Tags
			SELECT TOP 5 a.Title, a.Description FROM dbo.article_Articles a 
			WHERE a.ArticleId IN (
				SELECT TOP 8 tpe2.ArticleId
				FROM dbo.article_TagPostEntry tpe1 
				INNER join dbo.article_TagPostEntry tpe2 ON tpe1.TagId = tpe2.TagId
				WHERE  tpe1.ArticleId != tpe2.ArticleId AND tpe1.ArticleId = @ArticleId
				GROUP BY tpe2.ArticleId 
				ORDER BY NEWID())
			



Here's another way to query Related Article Posts based on similar Tags (THIS METHOD WILL PROBABLY be SLOWER, but you could modify the code to query TAGS from seperate tables that may not be related by IDs):
DECLARE @ArticleId INT
SET @ArticleId = 3 -- TEST NUMBER

-- Get the Related Articles based on similiar Tags
			SELECT TOP 5 a.Title, a.Description FROM dbo.article_Articles a 
			WHERE a.ArticleId IN (
				SELECT TOP 5 ArticleId
				FROM dbo.article_TagPostEntry 
				WHERE ArticleId <> @ArticleId AND TagId IN (SELECT tpe.TagId FROM dbo.article_TagPostEntry tpe WHERE tpe.ArticleId = @ArticleId) 
				GROUP BY  ArticleId 
				ORDER BY COUNT(TagId) DESC)
				ORDER BY NEWID()

Alternatively, to get a more randomized listing of related Posts, replace ORDER BY COUNT(TagID) DESC wtih:
ORDER BY NEWID(), COUNT(TagId) DESC


----------------------

Keep in mind the Table Structure for Tags uses 2 tables:

Table #1 is the "article_TagPostEntry" table and looks like this:
TABLE [dbo].[article_TagPostEntry](
	[PostTagId] [bigint] IDENTITY(1,1) NOT NULL,
	[ArticleId] [int] NOT NULL,
	[TagId] [bigint] NOT NULL,
	[DateCreated] [datetime] NOT NULL,

Table #2 is the "article_TagNames" table and is where the Names of the Tags are actually kept.  TagId is used to relate the two Tag tables together.  The Related Posts SQL code in the example above doesn't need to access this table.
 
TABLE [dbo].[article_TagNames](
	[TagId] [bigint] IDENTITY(1,1) NOT NULL,
	[TagName] [nvarchar](50) NOT NULL,
	[TagPathName] [nvarchar](100) NOT NULL,
	[TagCount] [int] NOT NULL,
	[DateCreated] [date] NOT NULL,

NOTE: The main Articles table gets related to the "article_TagPostEntry" table using ArticleId.


JUST Keep in mind that these queries will put a serious strain on your database if you run them dynamically everytime someone visits a page.  You'd probably be better off inserting the Related Posts that this query outputs into a Table, and retrieving the data by ID's.   Otherwise, you may find you get a lot of  SQL Timeout errors if your database server is not powerful enough to handle the load on a high traffic site.  You don't want to deal with the following error:
 "Timeout expired. The timeout period elapsed prior to completion of the
operation or the server is not responding"
 


2
0

Add your comment

by Anonymous - Already have an account? Login now!
Your Name:  

Comment:  
Enter the text you see in the image below
What do you see?
Can't read the image? View a new one.
Your comment will appear after being approved.

Related Posts


UPDATE 12-16-2011: If you are using ASP.NET and your website / database is live, and you want to take the database offline, you're better off putting an App_Offline.htm file in your website main directory to first take the website down for maintance (I...  more »

Facebook allows you to tag your friends in your status updates using the @ symbol. The tagging feature in facebook is similiar to that of Twitter, but you don't need to know your friends "@username" to use the tag. Instead follow these instructions to tag...  more »

Here's a quick SQL tip on how to get records beginning with numbers only. Use the query string LIKE '[0-9]%' For example: SELECT u.UserId, u.UserName FROM dbo.aspnet_Users u WHERE u.UserName LIKE '[0-9]%' ORDER BY u.UserName Keep in mind, If you are using...  more »

I recently ran into an issue with the ELMAH Error logger Table getting way TOO Big in my SQL Server database. The table row size in the ELMAH_Error table had over 200,000 rows. This was way more than I ever needed to look at, and was causing other issues...  more »

So here's a quick tip on how to fix the following SQL Server "SqlDateTime overflow" error. System.Data.SqlTypes.SqlTypeException SqlDateTime overflow. Must be between 1/1/1753 12:00:00 AM and 12/31/9999 11:59:59 PM. The reason that you are probably...  more »

Here's a couple quick SQL database tips for finding and deleting duplicate values in a SQL Server table.To find duplicates in a Column use the following SQL: SELECT ColName1, COUNT(*) TotalCountFROM TableName GROUP BY ColName1 HAVING (COUNT(ColName1) >...  more »

If you are getting the following error while trying to access or deploy a SQL Services Reporting Services (ssrs) report server on your localhost: The permissions granted to user 'domain\username' are insufficient for performing this operation....  more »

I just came across a way to test a data providers connection string (like a SQL Server database) with the help of a plain text file using Notepad. To investigate and test out if your connection string works, your going to want to create a UDL file. To do...  more »