HOME
Remove yourself from the Wayback Machine
I am going to disallow the Wayback Machine from archiving my content for this blog. If you are not aware of what the Wayback Machine a.k.a Internet Archive is, plug the url of your domain in here to see your websites history. I use this tool for checking domain history when offering advice to people in SEO forums. It is amusing when someone screams, “Google is punishing my “white hat” website and I don’t know why?” You then check the Wayback Machine and find that their domain was used to spam search engines not long ago. As I mention below, I also believe that these people should not be judged on past content if they have cleaned up their act.
The Wayback Machine offers a simple way to remove yourself from their archive and disallow all future archiving via robots.txt.
The Internet Archive is not interested in offering access to Web sites or other Internet documents whose authors do not want their materials in the collection. To remove your site from the Wayback Machine, place a robots.txt file at the top level of your site (e.g. www.yourdomain.com/robots.txt) and then submit your site below.
The robots.txt file will do two things:
1. It will remove all documents from your domain from the Wayback Machine.
2. It will tell us not to crawl your site in the future.To exclude the Internet Archive’s crawler (and remove documents from the Wayback Machine) while allowing all other robots to crawl your site, your robots.txt file should say:
User-agent: ia_archiver
Disallow: /
Read: Removing Documents From the Wayback Machine
Here is an example of my robots.txt
Why am I going to stop the Wayback Machine from archiving content?
This SEO blog is about learning and growing, I do not want to have my childhood in SEO logged for all the world to see. It is also a natural human trait to judge others on their past, ever go to a high school reunion and meet classmates who believe you are the same person they remember? That’s right, you are not and for those of us who often learn by trial and error in life it can be a little strange at times.
Now, it’s Google’s turn to allow removal of indexed history via Webmaster Tools. Should search engines judge you by who you were in the past or should they also allow for a fresh start?
Similar Post:
- How to remove content from Google
- Sitemaps Autodiscovery via Robots.txt
- Why to not noindex your feed

March 5th, 2007 at 10:16 am
oxymoron
March 5th, 2007 at 10:21 am
Hrm, let me define:oxymoron, ah yes, I guess this it true. :)
March 5th, 2007 at 2:05 pm
Considering they haven’t published any new data since last summer, aren’t you acting a little late?
March 5th, 2007 at 2:28 pm
Michael - So you believe they are not going to update their archives again? If you read above, using robots.txt “removes” you completely. It’s never too late. Check out what happens now after adding the robots.txt exclusion this morning here.
This post was more of a snarky jab at Google for relying on old first impressions in it’s rankings anyhow.
Stop playing the role of angry doctor of SEO man, you are missing the point. ;)
March 5th, 2007 at 3:04 pm
Michael - I think that’s his point. He doesn’t want the old data that’s there to show up.
As far as the high school thing, I saw something similar last week when my best friend’s brother passed away. I ended up back in my hometown and everyone remembered me exactly as I was as a child. I’ve evolved somewhat since then (although I’m still immature).
I also remembered a lot of people that I saw again as jackasses because I ran with a different crowd at the time. The thing is that a lot of them changed and really stepped up to the plate.
So I can understand the past impression thing.
March 5th, 2007 at 8:06 pm
thanks. i didn’t know it was possible to remove historic data from the archive. insert sudden urge to buy a few domains back.
March 6th, 2007 at 6:54 am
Corey - Glad I could help, I think that it is a good thing even if the archive still exists, you just have the choice of allowing it to be public record or not.
My prediction is that you will see more of this type of thing in the future as people pressure search engines and archiving services with lawsuits.
March 7th, 2007 at 7:49 am
Can you do a test by renaming your robots.txt file and see what happens?
From the tests I made, it seems they don’t *remove* the content from their servers. I added a robots.txt to one of my domains and when tried to see the past content I was not allowed and got the “Robots.txt Query Exclusion” message.
Then I renamed the robots.txt file and I was able to see again the past content. So, it seems that they will not show any past content as long as you have a robots.txt file to exclude them.
March 7th, 2007 at 9:19 am
Very interesting CM, I looked and found the following as the final removal step here:
Does the above suggest that an “Alexa” recrawl is required to have data removed?
March 7th, 2007 at 3:42 pm
Maybe. I also saw that after my post and submitted the domain. Let’s wait and see what happens after a recrawl ;-)
March 20th, 2007 at 5:56 pm
Is there a way to remove Alexa / Wayback data for domains that have now expired? — that I no longer own, in other words. Others have bought them or they are listed with Godaddy/etc as the owners.
That looks like a huge flaw in their system.
My contact details are sitting out there for others to see on domains that I no longer own or control. The new owners havent bothered to change things - maybe they dont even know about these listings.
–
Different question:
Where else are domain ownership details republished, otehr than Alexa and Wayback (and AboutUs.org which anybody can edit, so removal is easy enough… once you know to look there).
March 20th, 2007 at 6:17 pm
Well if the site is not in your hands I would not think there is a automated way to remove, try emailing the admin @ info@archive.org
Second question: Good question, who else collects data and makes it public? I will study this and make another post on it.
I would suggest getting private registration for new domains, Godaddy offers this.
March 20th, 2007 at 6:58 pm
Thanks.
YOu mention Godaddy (so did I). Did you see Mike Filsaime’s explanation of why they are no good for internet marketers - get one spam accusation and they can clobber you.
Contrast NameCheap who dont, says Mike in his recent freebie called Rolodex (or similar).
Do you agree with him? - or is this outside your realm?
-
Do mere mortals get replies from archive.org?
March 20th, 2007 at 7:07 pm
Do you mind using a real name when you comment on my blog or linking a website?
Anyhow, I pay little attention to Mike Filsaime, got on his mailing list and he will not leave me alone but let me correct what he said about Godaddy. It’s not true! :)
I bet if you email them or hunt down an actual “webmaster” email address via whois and act real polite they will respond. If not act real mad and threaten that you will be taking this up in court! :)
But to be serious, nobody should be allowed to keep data out there if the owner decides he/she no longer wants it to be public.
Thanks for reminding me, I want to remove the no archive tag from this site and see if the info. was actually removed.
March 20th, 2007 at 7:58 pm
DomainTools.com:
http://domain-history.domaintools.com/
From their page:
March 20th, 2007 at 8:27 pm
Aaron,
I just renamed the robots.txt file on the site I was doing the test, and Wayback Machine didn’t delete any info - it shows pages from the domain since 2003.
I placed the exclusion on 7 March. Last time Alexa (IA Archiver) visited my site was on 13 March. Plenty of time to delete the data.
IMO it’s very misleading what they say on the “Removing Documents From the Wayback Machine” page at http://www.archive.org/about/exclude.php
So, it seems they do not remove any documents from their db - they just don’t show them if there’s currently a robots.txt file to exclude them. Not the same thing.
March 20th, 2007 at 8:48 pm
Thanks Aaron.
Couldnt agree more with you — “nobody should be allowed to keep data out there if the owner decides he/she no longer wants it to be public.”
And thanks to CM for the link.
Ray
March 21st, 2007 at 10:33 am
You and me both man, I did this also last night and was sad to see the stuff still there. Now it is time to start pushing the issue, funny thing I just saw something on the TV news about this! :)