I'm using a function to check if an external url exists. Here's the code with the status messages removed for clarity.
public static bool VerifyUrl(string url)
{
url.ThrowNullOrEmpty("url");
if (!(url.StartsWith("http://") || url.StartsWith("https://")))
return false;
var uri = new Uri(url);
var webRequest = HttpWebRequest.Create(uri);
webRequest.Timeout = 5000;
webRequest.Method = "HEAD";
HttpWebResponse webResponse;
try
{
webResponse = (HttpWebResponse)webRequest.GetResponse();
webResponse.Close();
}
catch (WebException)
{
return false;
}
if (string.Compare(uri.Host, webResponse.ResponseUri.Host, true) != 0)
{
string responseUri = webResponse.ResponseUri.ToString().ToLower();
if (responseUri.IndexOf("error") > -1 || responseUri.IndexOf("404.") > -1 || responseUri.IndexOf("500.") > -1)
return false;
}
return true;
}
I've run a test over some external urls and found that about 20 out of 100 are coming back as errors. If i add a user agent the errors are around 14%.
The errors coming back are "forbidden", although this can be resolved for 6% using a user agent, "service unavialable", "method not allowed", "not implemented" or "connection closed".
Is there anything I can do to my code to ensure more, preferrably all give a valid response to their existance?
Altermatively, code that can be purchased to do this more effectively.
UPDATE - 14th Nov 12 ----------------------------------------------------------------------
After following advice from previous respondants, I'm now in a situation where I have a single domain that returns Service Unavailable (503). The example I have is www.marksandspencer.com.
When I use this httpsniffer web-sniffer.net as opposed to the one recommended in this thread, it works, returning the data using a webrequest.GET, however I can't work out what I need to do, to make it work in my code.