I've been looking for such a tool as well and http://www.plagspotter.com/ was a great solution. It can show you what sites on the Web have published content from your site on their pages.
Try this http://www.copyscape.com/ or this http://www.plagium.com/
Here is a system that a customer of mine has used called turnitin. This is designed for students but might be of some assistance...?
There are a couple of ways.
1) grab a whole paragraph of text from the webpage and the do a google search with it. It's simple and grude but not 100%
2) Otherwise go here: http://www.copyscape.com/
Hi Anil, I think this is what you're looking for:
(its a site that finds duplicate content and it works quite good for my sites it has find quite some content out there).
Then to protect you, you might look at:
https://www.google.com/search?q=content+checker+tools&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a Copyscape is the one I've used
Chris and Clive's responses are excellent; I was unaware of most of these resources, though as Chris suggests, simply searching for content in quotes can and does usually work well. I periodically conduct searches on both my name and my URL, and have found (and had removed) an amazing amount of plagiarism over the years. Good luck, Anil!
If you have a website and are looking to see if people are using your images, there are a few tools use can use. These tools will show if people are using the same image as you (even if they saved the file to their site):
http://images.google.com (click on the camera icon to search by image)
If you have a website and are looking to stop people directly from linking to your content (images mostly but could include music, videos, etc.) you could setup your server to disable hotlinking. There's ways for the copying person to get around it but it could stop a bunch of people from doing so. Here's an example for Apache, which is a popular web server:
And finally, I'm not as familiar about a solution for text content but a simple Google search in quotes can usually produce results. What you do is look for a unique sentence in your content and then search for it. So if I was looking for people copying this answer, I'd probably search for:
"text content but a simple Google search in quotes can usually produce results"
"ways for the copying person to get around it but it could stop a bunch of people"
I'm in the process of developing software that can do this on an industrial scale. It's part of my IP Infringement Detection service described briefly at http://bit.ly/106DAxl
I have a developer's account with Yahoo that lets me perform unlimited high-volume searches for keywords and phrases. BTW Google doesn't allow high volume automated searches!
Many people will have unrealistic expectations when they propose web crawling and web search projects, and this is because they haven't yet encountered enough websites (literally 10,000 or more) to realise that there are no rules in how you format a web page. To make matters worse, many websites have HTML coding errors that trip up scraper software.
A couple of useful websites:
http://www.copyscape.com/online-infringement/ (simple freebie service - won't detect snippets or partial copying)
http://www.ipwatchdog.com/ (for when you've found your copyright infringed)
For more just Google "infringement detection"
Please let me know if you think I might be able to help you further.