If you want to provide a sitemap to the search engines that have just the URL’s you want and none of the junk or various permutations of a URL a BOT is bound to find, you have just one choice and that is to programmatically develop a file with the output you desire.
I could find no utility on the market that would allow you to feed it a T-SQL query and have it output an XML sitemap. Maybe some day I’ll write it. For now I can accomplish my needs with the stored procedure below.
So what we have here is a stored procedure that does the following:
- Creates seven sitemaps based on seven different queries. Each sitemap has a specific type of page we want the search engines to follow.
- Outputs the sitemap to an XML file encoded in UTF-16 (only format MS SQL supports unfortunately).
- Converts the files from UTF-16 to UTF-8
- Finally, deletes and names the files
I used the SQL XML RAW capability to output my query in an XML format. This works REALLY well.
The most time consuming part was trying to cheat my way into getting a UTF-8 formatted file since MS SQL just can’t do anything but UTF-16. There are easier ways to do this in Linux, in Windows I had to find a tool I could use from a command line. I ended up finding a great tool for converting various types of documents from one encoding to another. It’s called Unifier by Melody-Soft.
At this point all that was left was renaming the files to be in a standard format. I do this with xp_cmdshell as you can see in my SP below.
1: USE [CART]
2: GO
3: /****** Object: StoredProcedure [dbo].[Sp_CreateSitemaps2] Script Date: 08/19/2009 12:16:34 ******/
4: SET ANSI_NULLS ON
5: GO
6: SET QUOTED_IDENTIFIER ON
7: GO
8: -- =============================================
9: -- Author: Sean McWherter
10: -- Create date: 8-10-09
11: -- Description: Generates all sitemaps
12: -- =============================================
13: ALTER PROCEDURE [dbo].[Sp_CreateSitemaps]
14: AS
15: BEGIN
16: -- SET NOCOUNT ON added to prevent extra result sets from
17: -- interfering with SELECT statements.
18: SET NOCOUNT ON;
19: --PS PRODUCTS
20: DECLARE @x_psproduct xml
21: DECLARE @strXML_psproduct varchar(MAX)
22: SET @x_psproduct=(
23:
24: select --top 200
25: 'http://www.petstore.com/' + replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(case when WebParent is not null then ParentDesc else description end, ' - ', '_'),'-','_') +
26: '_' + replace(LongName, '-', '_') +
27: '-' + replace(ManufacturerName, '-', '_') +
28: '-' + idProduct +
29: '-' + a.GPCat4,
30: '®',''),'.',''),'/','_'),',',''),'& ',''),' ','_'),'&','_'),'™',''),':',''),'>',''),'<',''),'%','_') +
31: '-vi.html' as loc,
32: 'daily' as changefreq
33: --, idproduct, parentdesc, longname, manufacturername
34: from MDCART_products a inner join MDCART_CategoriesGP b on a.GpCat4 = b.GPCat1 inner join MDCART_Manufacturers c on a.ManufacturerCode = c.ManufacturerCode
35: where ((a.idProduct = WebParent and ParentDesc is not null) or WebParent is null) and LongName is not null and description is not null and ManufacturerName is not null and a.GpCat4 is not null and Visable = 1
36: and LEFT(gpcat4,2) in (select distinct LEFT(gpcat4,2) cats from MDCART_products where active <> 0 and gpcat4 is not null)
37: order by a.idProduct desc
38: for xml raw('url'), ELEMENTS)
39:
40: SET @strXML_psproduct= '<?xml version="1.0" encoding="UTF-8"?>
41: <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">'
42: + CONVERT(varchar(MAX),@x_psproduct) + '</urlset>'
43:
44: --PS CATEGORY FEED
45: DECLARE @x_pscategory xml
46: DECLARE @strXML_pscategory varchar(MAX)
47: SET @x_pscategory=(
48:
49: select --top 250
50: 'http://www.petstore.com/' + replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(LongName, ' - ', '_'),'-','_') +
51: '-' + GPCat1,
52: '®',''),'.',''),'/','_'),',',''),'& ',''),' ','_'),'&','_'),'™',''),':',''),'>',''),'<',''),'%','_') +
53: '-ct.html' as loc,
54: 'hourly' as changefreq
55: from MDCART_CategoriesGP
56: where Visable = 1 and LongName is not null and GPCat1 is not null
57: and LEFT(GPCat1,2) in (select distinct LEFT(gpcat4,2) cats from MDCART_products where active <> 0 and gpcat4 is not null)
58: order by gpcat1
59: for xml raw('url'), ELEMENTS)
60:
61: SET @strXML_pscategory= '<?xml version="1.0" encoding="UTF-8"?>
62: <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">'
63: + CONVERT(varchar(MAX),@x_pscategory) + '</urlset>'
64:
65: --PS CATEGORY W/ BRAND FEED
66: DECLARE @x_pscategory_brand xml
67: DECLARE @strXML_pscategory_brand varchar(MAX)
68: SET @x_pscategory_brand=(
69:
70: select --top 250
71: 'http://www.petstore.com/' + replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(ManufacturerName, ' - ', '_'),'-','_') +
72: '-' + LongName +
73: '-' + ManufacturerCode +
74: '-' + GPCat1,
75: '®',''),'.',''),'/','_'),',',''),'& ',''),' ','_'),'&','_'),'™',''),':',''),'>',''),'<',''),'%','_') +
76: '-ct.html' as loc,
77: 'hourly' as changefreq
78: from ( SELECT DISTINCT b.ManufacturerName, c.LongName, b.ManufacturerCode, c.GPCat1
79: FROM MDCART_products AS a INNER JOIN MDCART_Manufacturers AS b ON a.ManufacturerCode = b.ManufacturerCode INNER JOIN
80: MDCART_CategoriesGP AS c ON a.GpCat4 = c.GPCat1
81: where ManufacturerName is not null and LongName is not null and description is not null and b.ManufacturerCode is not null and c.GPCat1 is not null and Visable =1
82: and LEFT(c.GPCat1,2) in (select distinct LEFT(gpcat4,2) cats from MDCART_products where active <> 0 and gpcat4 is not null)) as a1
83: order by ManufacturerName, LongName
84: for xml raw('url'), ELEMENTS)
85:
86: SET @strXML_pscategory_brand= '<?xml version="1.0" encoding="UTF-8"?>
87: <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">'
88: + CONVERT(varchar(MAX),@x_pscategory_brand) + '</urlset>'
89:
90: --MD PRODUCTS
91: DECLARE @x_mdproduct xml
92: DECLARE @strXML_mdproduct varchar(MAX)
93: SET @x_mdproduct=(
94:
95: select --top 200
96: 'http://www.marinedepot.com/' + replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(case when WebParent is not null then ParentDesc else description end, ' - ', '_'),'-','_') +
97: '_' + replace(LongName, '-', '_') +
98: '-' + replace(ManufacturerName, '-', '_') +
99: '-' + idProduct +
100: '-' + a.GPCat1,
101: '®',''),'.',''),'/','_'),',',''),'& ',''),' ','_'),'&','_'),'™',''),':',''),'>',''),'<',''),'%','_') +
102: '-vi.html' as loc,
103: 'daily' as changefreq
104: --, idproduct, parentdesc, longname, manufacturername
105: from MDCART_products a inner join MDCART_CategoriesGP b on a.GPCat1 = b.GPCat1 inner join MDCART_Manufacturers c on a.ManufacturerCode = c.ManufacturerCode
106: where ((a.idProduct = WebParent and ParentDesc is not null) or WebParent is null) and LongName is not null and description is not null and ManufacturerName is not null and a.GPCat1 is not null and Visable = 1
107: and LEFT(a.GPCat1,2) in (select distinct LEFT(gpcat1,2) cats from MDCART_products where active = -1 and gpcat1 is not null)
108: order by a.idProduct desc
109: for xml raw('url'), ELEMENTS)
110:
111: SET @strXML_mdproduct= '<?xml version="1.0" encoding="UTF-8"?>
112: <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">'
113: + CONVERT(varchar(MAX),@x_mdproduct) + '</urlset>'
114:
115: --MD CATEGORY FEED
116: DECLARE @x_mdcategory xml
117: DECLARE @strXML_mdcategory varchar(MAX)
118: SET @x_mdcategory=(
119:
120: select --top 250
121: 'http://www.marinedepot.com/' + replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(LongName, ' - ', '_'),'-','_') +
122: '-' + GPCat1,
123: '®',''),'.',''),'/','_'),',',''),'& ',''),' ','_'),'&','_'),'™',''),'>',''),'<',''),'%','_') +
124: '-ct.html' as loc,
125: 'hourly' as changefreq
126: from MDCART_CategoriesGP
127: where Visable = 1 and LongName is not null and GPCat1 is not null
128: and LEFT(GPCat1,2) in (select distinct LEFT(gpcat1,2) cats from MDCART_products where active = -1 and gpcat1 is not null)
129: order by gpcat1
130: for xml raw('url'), ELEMENTS)
131:
132: SET @strXML_mdcategory= '<?xml version="1.0" encoding="UTF-8"?>
133: <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">'
134: + CONVERT(varchar(MAX),@x_mdcategory) + '</urlset>'
135:
136: --MD CATEGORY W/ BRAND FEED
137: DECLARE @x_mdcategory_brand xml
138: DECLARE @strXML_mdcategory_brand varchar(MAX)
139: SET @x_mdcategory_brand=(
140:
141: select --top 250
142: 'http://www.marinedepot.com/' + replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(ManufacturerName, ' - ', '_'),'-','_') +
143: '-' + LongName +
144: '-' + ManufacturerCode +
145: '-' + GPCat1,
146: '®',''),'.',''),'/','_'),',',''),'& ',''),' ','_'),'&','_'),'™',''),':',''),'>',''),'<',''),'%','_') +
147: '-ct.html' as loc,
148: 'hourly' as changefreq
149: from ( SELECT DISTINCT b.ManufacturerName, c.LongName, b.ManufacturerCode, c.GPCat1
150: FROM MDCART_products AS a INNER JOIN MDCART_Manufacturers AS b ON a.ManufacturerCode = b.ManufacturerCode INNER JOIN
151: MDCART_CategoriesGP AS c ON a.gpcat1 = c.GPCat1
152: where ManufacturerName is not null and LongName is not null and description is not null and b.ManufacturerCode is not null and c.GPCat1 is not null and Visable =1
153: and LEFT(c.GPCat1,2) in (select distinct LEFT(gpcat1,2) cats from MDCART_products where active = -1 and gpcat1 is not null)) as a1
154: order by ManufacturerName, LongName
155: for xml raw('url'), ELEMENTS)
156:
157: SET @strXML_mdcategory_brand= '<?xml version="1.0" encoding="UTF-8"?>
158: <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">'
159: + CONVERT(varchar(MAX),@x_mdcategory_brand) + '</urlset>'
160:
161: --MD AQUARIUM PAGE FEED
162: DECLARE @x_mdaquarium xml
163: DECLARE @strXML_mdaquarium varchar(MAX)
164: SET @x_mdaquarium=(
165:
166: select --top 250
167: 'http://www.marinedepot.com/' + replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(idPage, ' - ', '_'),'-','_'),
168: '®',''),'.',''),'/','_'),',',''),'& ',''),' ','_'),'&','_'),'™',''),':',''),'>',''),'<',''),'%','_') +
169: '-ap.html' as loc,
170: 'daily' as changefreq
171: from MDCART_Aquarium_Page
172: order by idPage
173: for xml raw('url'), ELEMENTS)
174:
175: SET @strXML_mdaquarium= '<?xml version="1.0" encoding="UTF-8"?>
176: <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">'
177: + CONVERT(varchar(MAX),@x_mdaquarium) + '</urlset>'
178: ------- OUTPUT
179: exec xp_cmdshell 'del \\servername\sitemappath\SiteMaps\*.xml', no_output
180: exec spWriteStringToFile @strXML_psproduct, '\\sangbe\WWW_Live\petstore.com\IMD\SiteMaps', 'utf16-psproduct.xml'
181: exec spWriteStringToFile @strXML_pscategory, '\\sangbe\WWW_Live\petstore.com\IMD\SiteMaps', 'utf16-pscategory.xml'
182: exec spWriteStringToFile @strXML_pscategory_brand, '\\sangbe\WWW_Live\petstore.com\IMD\SiteMaps', 'utf16-pscategory_brand.xml'
183: exec spWriteStringToFile @strXML_mdproduct, '\\sangbe\WWW_Live\petstore.com\IMD\SiteMaps', 'utf16-mdproduct.xml'
184: exec spWriteStringToFile @strXML_mdcategory, '\\sangbe\WWW_Live\petstore.com\IMD\SiteMaps', 'utf16-mdcategory.xml'
185: exec spWriteStringToFile @strXML_mdcategory_brand, '\\sangbe\WWW_Live\petstore.com\IMD\SiteMaps', 'utf16-mdcategory_brand.xml'
186: exec spWriteStringToFile @strXML_mdaquarium, '\\sangbe\WWW_Live\petstore.com\IMD\SiteMaps', 'utf16-mdaquarium.xml'
187: exec xp_cmdshell 'C:\Unifier\xunifier.exe /c "\\servername\sitemappath\SiteMaps\config1.ucp" /l "\\servername\sitemappath\SiteMaps\sitemapconversion.log" /y "\\servername\sitemappath\SiteMaps\utf16*.xml"', no_output
188: exec xp_cmdshell 'rename \\servername\sitemappath\SiteMaps\output_utf16-psproduct.xml psproduct.xml', no_output
189: exec xp_cmdshell 'rename \\servername\sitemappath\SiteMaps\output_utf16-pscategory.xml pscategory.xml', no_output
190: exec xp_cmdshell 'rename \\servername\sitemappath\SiteMaps\output_utf16-pscategory_brand.xml pscategory_brand.xml', no_output
191: exec xp_cmdshell 'rename \\servername\sitemappath\SiteMaps\output_utf16-mdproduct.xml mdproduct.xml', no_output
192: exec xp_cmdshell 'rename \\servername\sitemappath\SiteMaps\output_utf16-mdcategory.xml mdcategory.xml', no_output
193: exec xp_cmdshell 'rename \\servername\sitemappath\SiteMaps\output_utf16-mdcategory_brand.xml mdcategory_brand.xml', no_output
194: exec xp_cmdshell 'rename \\servername\sitemappath\SiteMaps\output_utf16-mdaquarium.xml mdaquarium.xml', no_output
195: exec xp_cmdshell 'del \\servername\sitemappath\SiteMaps\utf16*.xml', no_output
196: END
1: var WebAddress = "http://www.google.com/webmasters/tools/ping?sitemap=http://www.marinedepot.com/sitemapfolder/mdaquarium.xml";
2: function main()
3: {
4: var xmlHttpRequest = CreateObject("Microsoft.XMLHTTP");
5: xmlHttpRequest.open("GET", WebAddress, false);
6: }
7:
8: var WebAddress2 = "http://www.google.com/webmasters/tools/ping?sitemap=http://www.marinedepot.com/sitemapfolder/mdaquarium.xml";
9: function main2()
10: {
11: var xmlHttpRequest2 = CreateObject("Microsoft.XMLHTTP");
12: xmlHttpRequest2.open("GET", WebAddress2, false);
13: }
14:
15: var WebAddress3 = "http://www.google.com/webmasters/tools/ping?sitemap=http://www.marinedepot.com/sitemapfolder/mdcategory_brand.xml";
16: function main3()
17: {
18: var xmlHttpRequest3 = CreateObject("Microsoft.XMLHTTP");
19: xmlHttpRequest3.open("GET", WebAddress3, false);
20: }
21:
22: var WebAddress4 = "http://www.google.com/webmasters/tools/ping?sitemap=http://www.marinedepot.com/sitemapfolder/mdproduct.xml";
23: function main4()
24: {
25: var xmlHttpRequest4 = CreateObject("Microsoft.XMLHTTP");
26: xmlHttpRequest4.open("GET", WebAddress4, false);
27: }
28:
29: var WebAddress5 = "http://www.google.com/webmasters/tools/ping?sitemap=http://www.petstore.com/sitemapfolder/pscategory.xml";
30: function main5()
31: {
32: var xmlHttpRequest5 = CreateObject("Microsoft.XMLHTTP");
33: xmlHttpRequest5.open("GET", WebAddress5, false);
34: }
35:
36: var WebAddress6 = "http://www.google.com/webmasters/tools/ping?sitemap=http://www.petstore.com/sitemapfolder/pscategory_brand.xml";
37: function main6()
38: {
39: var xmlHttpRequest6 = CreateObject("Microsoft.XMLHTTP");
40: xmlHttpRequest6.open("GET", WebAddress6, false);
41: }
42:
43: var WebAddress7 = "http://www.google.com/webmasters/tools/ping?sitemap=http://www.petstore.com/sitemapfolder/psproduct.xml";
44: function main7()
45: {
46: var xmlHttpRequest7 = CreateObject("Microsoft.XMLHTTP");
47: xmlHttpRequest7.open("GET", WebAddress7, false);
48: }