SEO Buzz Box

 Subscribe in a reader

HOME

Remove yourself from the Wayback Machine

Posted on March 5, 2007 - Filed Under Tools |

I am going to disallow the Wayback Machine from archiving my content for this blog. If you are not aware of what the Wayback Machine a.k.a Internet Archive is, plug the url of your domain in here to see your websites history. I use this tool for checking domain history when offering advice to people in SEO forums. It is amusing when someone screams, “Google is punishing my “white hat” website and I don’t know why?” You then check the Wayback Machine and find that their domain was used to spam search engines not long ago. As I mention below, I also believe that these people should not be judged on past content if they have cleaned up their act.

The Wayback Machine offers a simple way to remove yourself from their archive and disallow all future archiving via robots.txt.

The Internet Archive is not interested in offering access to Web sites or other Internet documents whose authors do not want their materials in the collection. To remove your site from the Wayback Machine, place a robots.txt file at the top level of your site (e.g. www.yourdomain.com/robots.txt) and then submit your site below.

The robots.txt file will do two things:

1. It will remove all documents from your domain from the Wayback Machine.
2. It will tell us not to crawl your site in the future.

To exclude the Internet Archive’s crawler (and remove documents from the Wayback Machine) while allowing all other robots to crawl your site, your robots.txt file should say:

User-agent: ia_archiver
Disallow: /

Read: Removing Documents From the Wayback Machine

Here is an example of my robots.txt

Why am I going to stop the Wayback Machine from archiving content?

This SEO blog is about learning and growing, I do not want to have my childhood in SEO logged for all the world to see. It is also a natural human trait to judge others on their past, ever go to a high school reunion and meet classmates who believe you are the same person they remember? That’s right, you are not and for those of us who often learn by trial and error in life it can be a little strange at times.

Now, it’s Google’s turn to allow removal of indexed history via Webmaster Tools. Should search engines judge you by who you were in the past or should they also allow for a fresh start?



Similar Post:

18 Responses to “Remove yourself from the Wayback Machine”

  1. Peter Davis Says:

    oxymoron

  2. admin Says:

    Hrm, let me define:oxymoron, ah yes, I guess this it true. :)

  3. Michael Martinez Says:

    Considering they haven’t published any new data since last summer, aren’t you acting a little late?

  4. Admin Says:

    Michael - So you believe they are not going to update their archives again? If you read above, using robots.txt “removes” you completely. It’s never too late. Check out what happens now after adding the robots.txt exclusion this morning here.

    This post was more of a snarky jab at Google for relying on old first impressions in it’s rankings anyhow.

    Stop playing the role of angry doctor of SEO man, you are missing the point. ;)

  5. SEFL Says:

    Michael - I think that’s his point. He doesn’t want the old data that’s there to show up.

    As far as the high school thing, I saw something similar last week when my best friend’s brother passed away. I ended up back in my hometown and everyone remembered me exactly as I was as a child. I’ve evolved somewhat since then (although I’m still immature).

    I also remembered a lot of people that I saw again as jackasses because I ran with a different crowd at the time. The thing is that a lot of them changed and really stepped up to the plate.

    So I can understand the past impression thing.

  6. corey Says:

    thanks. i didn’t know it was possible to remove historic data from the archive. insert sudden urge to buy a few domains back.

  7. admin Says:

    Corey - Glad I could help, I think that it is a good thing even if the archive still exists, you just have the choice of allowing it to be public record or not.

    My prediction is that you will see more of this type of thing in the future as people pressure search engines and archiving services with lawsuits.

  8. CM Says:

    Can you do a test by renaming your robots.txt file and see what happens?

    From the tests I made, it seems they don’t *remove* the content from their servers. I added a robots.txt to one of my domains and when tried to see the past content I was not allowed and got the “Robots.txt Query Exclusion” message.

    Then I renamed the robots.txt file and I was able to see again the past content. So, it seems that they will not show any past content as long as you have a robots.txt file to exclude them.

  9. admin Says:

    Very interesting CM, I looked and found the following as the final removal step here:

    Once you have put a robots.txt file up, submit your site (www.yourdomain.com) on the form on http://pages.alexa.com/help/webmasters/index.html#crawl_site.

    Does the above suggest that an “Alexa” recrawl is required to have data removed?

  10. CM Says:

    Does the above suggest that an “Alexa” recrawl is required to have data removed?

    Maybe. I also saw that after my post and submitted the domain. Let’s wait and see what happens after a recrawl ;-)

  11. Question Says:

    Is there a way to remove Alexa / Wayback data for domains that have now expired? — that I no longer own, in other words. Others have bought them or they are listed with Godaddy/etc as the owners.

    That looks like a huge flaw in their system.

    My contact details are sitting out there for others to see on domains that I no longer own or control. The new owners havent bothered to change things - maybe they dont even know about these listings.

    Different question:

    Where else are domain ownership details republished, otehr than Alexa and Wayback (and AboutUs.org which anybody can edit, so removal is easy enough… once you know to look there).

  12. admin Says:

    Well if the site is not in your hands I would not think there is a automated way to remove, try emailing the admin @ info@archive.org

    Second question: Good question, who else collects data and makes it public? I will study this and make another post on it.

    I would suggest getting private registration for new domains, Godaddy offers this.

  13. Question Says:

    Thanks.

    YOu mention Godaddy (so did I). Did you see Mike Filsaime’s explanation of why they are no good for internet marketers - get one spam accusation and they can clobber you.

    Contrast NameCheap who dont, says Mike in his recent freebie called Rolodex (or similar).

    Do you agree with him? - or is this outside your realm?

    -

    Do mere mortals get replies from archive.org?

  14. Aaron Pratt Says:

    Do you mind using a real name when you comment on my blog or linking a website?

    Anyhow, I pay little attention to Mike Filsaime, got on his mailing list and he will not leave me alone but let me correct what he said about Godaddy. It’s not true! :)

    I bet if you email them or hunt down an actual “webmaster” email address via whois and act real polite they will respond. If not act real mad and threaten that you will be taking this up in court! :)

    But to be serious, nobody should be allowed to keep data out there if the owner decides he/she no longer wants it to be public.

    Thanks for reminding me, I want to remove the no archive tag from this site and see if the info. was actually removed.

  15. CM Says:

    Where else are domain ownership details republished

    DomainTools.com:
    http://domain-history.domaintools.com/
    From their page:

    Domain Tools has been tracking the whois history of millions of domains since 2000. Domain History gives you access to our massive database of historical whois records.
    Supported TLDs are .com, .net, .org, .biz, .us, and .info.

  16. CM Says:

    Aaron,

    I just renamed the robots.txt file on the site I was doing the test, and Wayback Machine didn’t delete any info - it shows pages from the domain since 2003.
    I placed the exclusion on 7 March. Last time Alexa (IA Archiver) visited my site was on 13 March. Plenty of time to delete the data.

    IMO it’s very misleading what they say on the “Removing Documents From the Wayback Machine” page at http://www.archive.org/about/exclude.php

    The robots.txt file will do two things:
    1. It will remove all documents from your domain from the Wayback Machine.
    2. It will tell us not to crawl your site in the future.

    So, it seems they do not remove any documents from their db - they just don’t show them if there’s currently a robots.txt file to exclude them. Not the same thing.

  17. Ray (Question) Says:

    Thanks Aaron.

    Couldnt agree more with you — “nobody should be allowed to keep data out there if the owner decides he/she no longer wants it to be public.”

    And thanks to CM for the link.

    Ray

  18. Aaron (Pratt) Says:

    You and me both man, I did this also last night and was sad to see the stuff still there. Now it is time to start pushing the issue, funny thing I just saw something on the TV news about this! :)

Leave a Reply