Crawler_trap

Spider trap

Set of web pages that can undermine web crawlers

A spider trap (or crawler trap) is a set of web pages that may intentionally or unintentionally be used to cause a web crawler or search bot to make an infinite number of requests or cause a poorly constructed crawler to crash. Web crawlers are also called web spiders, from which the name is derived. Spider traps may be created to "catch" spambots or other crawlers that waste a website's bandwidth. They may also be created unintentionally by calendars that use dynamic pages with links that continually point to the next day or year.

Common techniques used are:

creation of indefinitely deep directory structures like http://example.com/bar/foo/bar/foo/bar/foo/bar/...
Dynamic pages that produce an unbounded number of documents for a web crawler to follow. Examples include calendars^[1] and algorithmically generated language poetry.^[2]
documents filled with many characters, crashing the lexical analyzer parsing the document.
documents with session-id's based on required cookies.

There is no algorithm to detect all spider traps. Some classes of traps can be detected automatically, but new, unrecognized traps arise quickly.

Share this article:

This article uses material from the Wikipedia article Crawler_trap, and is written by contributors. Text is available under a CC BY-SA 4.0 International License; additional terms may apply. Images, videos and audio are available under their respective licenses.

[1] [1]
""What is a Spider Trap?"". Techopedia. 27 November 2017. Retrieved 2018-05-29.

[2] [2]
Neil M Hennessy. "The Sweetest Poison, or The Discovery of L=A=N=G=U=A=G=E Poetry on the Web". Accessed 2013-09-26.

[3] [3]
"Portent". Portent. 2016-02-03. Retrieved 2019-10-16.

[4] [4]
"How to Set Up a robots.txt to Control Search Engine Spiders (thesitewizard.com)". www.thesitewizard.com. Retrieved 2019-10-16.

[5] [5]
"Building a Polite Web Crawler". The DEV Community. 13 April 2019. Retrieved 2019-10-16.

[6] [6]
Group, J. Media (2017-10-12). "Closing a spider trap: fix crawl inefficiencies". J Media Group. Retrieved 2019-10-16.

[1]

[2]

[3]

[4]

[5]

[6]

Crawler_trap

Spider trap

Politeness

See also

References

Share this article: