Search Server 2008: Federated sites that do not return XML
The OpenSearch standard allows for the returned content to be in XML or HTML/XHTML format, although the later makes it more difficult for Search Server 2008 federation as it is designed to use XSL Translation on the results to present the information to the user. There is however a fairly simple (although it does require some code) process to provide the intermediary step between the Federated web parts and the Federated search source.
In this example I will provide an example of how this can be achieved against the well known search engine Google, you could do the same thing here against any data source that is accessible through code, so you could roll your own BDC type solution to expose Line of Business information through Federation. The basic steps for this will be the creation of an intermediary page that runs within the SharePoint layouts directory which will receive the query string, make a HTTP request to Google and using some simple regular expressions and a bit of string manipulation to construct an RSS formatted XML string to return to the Federated Search Web Part.
Note: This is not production ready code and is provided as an example of how easy it is to federate to other services that do not provide the required query/return formats.
1. Create the Google.aspx page.
In this example I will use code beside (i.e. deploying the .cs to the server), you will probably want to pre-compile the code in a production environment.
Create a new file called Google.aspx and copy the following code into it.
<%@ Page Language="C#" AutoEventWireup="true" CodeFile="Google.aspx.cs" Inherits="_Default" %> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" ><head runat="server"> <title>Untitled Page</title></head><body> <form id="form1" runat="server"> <div> </div> </form></body></html>
As you will see this code is pretty blank and is just provided to enable us to hookup a page load event in the code file we will create next.
Create another file in the same directory called Google.aspx.cs into which we will add some code to:
Get the query string
protected string query; protected void Page_Load(object sender, EventArgs e) { query = Request.QueryString["q"]; }
Call Google and parse the results
private string getRssItemXml(string query) { string url = string.Format("http://www.google.com/search?q={0}", query); WebClient client = new WebClient(); byte[] byteData = client.DownloadData(url); string strData = Encoding.UTF8.GetString(byteData); Regex searchPattern = new Regex("<div class=g><h2 class=r><a href=\"(?<link>.*?)\"(.*?)>(?<title>.*?)</a>(.*?)<td class=\"j\">(?<desc>.*?)<br><span class=a>(.*?)</td></tr></table></div>"); StringBuilder sb = new StringBuilder(); foreach (Match m in searchPattern.Matches(strData)) { sb.AppendFormat("<item><title><![CDATA[{0}]]></title><link><![CDATA[{1}]]></link><description><![CDATA[{2}]]></description></item>", m.Groups["title"].Value, m.Groups["link"].Value, m.Groups["desc"].Value); }
The code above looks a little messy so I will explain
Get the url based on the Query string that the user entered, this was parsed during the page load and passed by the federated web part.
string url = string.Format(“http://www.google.com/search?q={0}”, query);
Using the WebClient class download the results of the query from Google into a Byte array.
WebClient client = new WebClient();
byte[] byteData = client.DownloadData(url);
Convert the Byte array into a string to be used in the regular expression search.
string strData = Encoding.UTF8.GetString(byteData);
Construct the regular expression to extract the search results. as can be seen Google does a reasonable job of keeping the formatting consistent so we are able to search for
<div class=g><h2 class=r>
Which is at the start of each result, then the link tag
<a href=\"(?<link>.*?)\"(.*?)>
Which provides us with the url of the result. Here we are tagging the value so that the Regex will make this available to us. This is followed by the title and description.
We then loop through the results using the Regex and create the RSS items that will be returned.
foreach (Match m in searchPattern.Matches(strData)) { sb.AppendFormat("<item><title><![CDATA[{0}]]></title><link><![CDATA[{1}]]></link><description><![CDATA[{2}]]></description></item>", m.Groups["title"].Value, m.Groups["link"].Value, m.Groups["desc"].Value); }
The whole code looks like this, including reference
using System;using System.Data;using System.Net;using System.Configuration;using System.Web;using System.Web.Security;using System.Web.UI;using System.Web.UI.WebControls;using System.Web.UI.WebControls.WebParts;using System.Web.UI.HtmlControls;using System.IO;using System.Text;using System.Text.RegularExpressions; public partial class _Default : System.Web.UI.Page { protected string query; protected void Page_Load(object sender, EventArgs e) { query = Request.QueryString["q"]; } protected override void Render(HtmlTextWriter writer) { StringBuilder sb = new StringBuilder(); Response.ContentType = "text/xml"; sb.Append("<?xml version=\"1.0\" encoding=\"utf-8\"?>"); sb.Append("<rss version=\"2.0\">"); sb.AppendFormat("<channel><title><![CDATA[Google: {0}]]></title><link/><description/><ttl>60</ttl>", query); sb.Append(getRssItemXml(query)); sb.Append("</channel></rss>"); writer.Write(sb.ToString()); } private string getRssItemXml(string query) { string url = string.Format("http://www.google.com/search?q={0}", query); WebClient client = new WebClient(); byte[] byteData = client.DownloadData(url); string strData = Encoding.UTF8.GetString(byteData); Regex searchPattern = new Regex("<div class=g><h2 class=r><a href=\"(?<link>.*?)\"(.*?)>(?<title>.*?)</a>(.*?)<td class=\"j\">(?<desc>.*?)<br><span class=a>(.*?)</td></tr></table></div>"); StringBuilder sb = new StringBuilder(); foreach (Match m in searchPattern.Matches(strData)) { sb.AppendFormat("<item><title><![CDATA[{0}]]></title><link><![CDATA[{1}]]></link><description><![CDATA[{2}]]></description></item>", m.Groups["title"].Value, m.Groups["link"].Value, m.Groups["desc"].Value); } return sb.ToString(); }}
2. Save the files to the LAYOUTS directory
If your testing this just complete the steps here, if not you will want to wrap this up as a WSP solution and deploy it correctly.
Copy the files Google.aspx and Google.aspx.cs to the 12\LAYOUTS\SEARCH directory. Note you will need to create the SEARCH directory, it is always better to store your application pages in a sub folder to avoid being overwritten by other installations.
3. Create a Federation Location Definition File (.FLD) to point to your local search.
Provide a name and description and the Query Template
http://mssxdemovpc/_layouts/search/google.aspx?q={searchTerms}
Where http://mssxdemovpc is the URL of your SharePoint installation.
Provide a “More Results” link to allow the user to navigate to Google if they want more.
http://www.google.co.uk/search?hl=en&q={searchTerms}
Specify Credentials – as the web part will be calling your SharePoint page you will probably need to enable Authentication to this. In my example I set this to NTLM – Use Application Pool Identity as all I needed was to get to the page. You may want to look at user based or a specific account.
Save your FLD and add edit your search results and see how you can bring Google search federation into your environment
Download
The files can be downloaded here from my SkyDrive, along with the Presentation from the SUGUK meeting.
NOTE: The FLD file does not store the Credentials in the XML so you will need to manually set this after you import it.
