[xiph-commits] r12945 - websites/xiph.org

ivo at svn.xiph.org ivo at svn.xiph.org
Sun May 13 12:17:25 PDT 2007


Author: ivo
Date: 2007-05-13 12:17:25 -0700 (Sun, 13 May 2007)
New Revision: 12945

Modified:
   websites/xiph.org/robots.txt
Log:
changed robots.txt

Modified: websites/xiph.org/robots.txt
===================================================================
--- websites/xiph.org/robots.txt	2007-05-13 17:42:37 UTC (rev 12944)
+++ websites/xiph.org/robots.txt	2007-05-13 19:17:25 UTC (rev 12945)
@@ -1,3 +1,138 @@
-User-agent: *
-Disallow /cgi-bin/
+#
+# robots.txt for http://www.wikipedia.org/ and friends
+# Xiph does not assume any kind of ownership over this file
+#
+# Please note: There are a lot of pages on this site, and there are
+# some misbehaved spiders out there that go _way_ too fast. If you're
+# irresponsible, your access to the site may be blocked.
+#
 
+# advertising-related bots:
+User-agent: Mediapartners-Google*
+Disallow: /
+
+# Wikipedia work bots:
+User-agent: IsraBot
+Disallow:
+
+User-agent: Orthogaffe
+Disallow:
+
+# Crawlers that are kind enough to obey, but which we'd rather not have
+# unless they're feeding search engines.
+User-agent: UbiCrawler
+Disallow: /
+
+User-agent: DOC
+Disallow: /
+
+User-agent: Zao
+Disallow: /
+
+# Some bots are known to be trouble, particularly those designed to copy
+# entire sites. Please obey robots.txt.
+User-agent: sitecheck.internetseer.com
+Disallow: /
+
+User-agent: Zealbot
+Disallow: /
+
+User-agent: MSIECrawler
+Disallow: /
+
+User-agent: SiteSnagger
+Disallow: /
+
+User-agent: WebStripper
+Disallow: /
+
+User-agent: WebCopier
+Disallow: /
+
+User-agent: Fetch
+Disallow: /
+
+User-agent: Offline Explorer
+Disallow: /
+
+User-agent: Teleport
+Disallow: /
+
+User-agent: TeleportPro
+Disallow: /
+
+User-agent: WebZIP
+Disallow: /
+
+User-agent: linko
+Disallow: /
+
+User-agent: HTTrack
+Disallow: /
+
+User-agent: Microsoft.URL.Control
+Disallow: /
+
+User-agent: Xenu
+Disallow: /
+
+User-agent: larbin
+Disallow: /
+
+User-agent: libwww
+Disallow: /
+
+User-agent: ZyBORG
+Disallow: /
+
+User-agent: Download Ninja
+Disallow: /
+
+#
+# Sorry, wget in its recursive mode is a frequent problem.
+# Please read the man page and use it properly; there is a
+# --wait option you can use to set the delay between hits,
+# for instance.
+#
+User-agent: wget
+Disallow: /
+
+#
+# The 'grub' distributed client has been *very* poorly behaved.
+#
+User-agent: grub-client
+Disallow: /
+
+#
+# Doesn't follow robots.txt anyway, but...
+#
+User-agent: k2spider
+Disallow: /
+
+#
+# Hits many times per second, not acceptable
+# http://www.nameprotect.com/botinfo.html
+User-agent: NPBot
+Disallow: /
+
+# A capture bot, downloads gazillions of pages with no public benefit
+# http://www.webreaper.net/
+User-agent: WebReaper
+Disallow: /
+
+# Don't allow the wayback-maschine to index user-pages
+#User-agent: ia_archiver
+#Disallow: /wiki/User
+#Disallow: /wiki/Benutzer
+
+#
+# Friendly, low-speed bots are welcome viewing article pages, but not
+# dynamically-generated pages please.
+#
+# Inktomi's "Slurp" can read a minimum delay between hits; if your
+# bot supports such a thing using the 'Crawl-delay' or another
+# instruction, please let us know.
+#
+## *at least* 1 second please. preferably more :D
+## we're disabling this experimentally 11-09-2006
+#Crawl-delay: 1



More information about the commits mailing list