Carrot2

Carrot2

Carrot2

Search results clustering engine


Carrot²[1] is an open source search results clustering engine.[2] It can automatically cluster small collections of documents, e.g. search results or document abstracts, into thematic categories. Carrot² is written in Java and distributed under the BSD license.

Quick Facts Developer(s), Stable release ...

History

The initial version of Carrot² was implemented in 2001 by Dawid Weiss as part of his MSc thesis to validate the applicability of the STC clustering algorithm to clustering search results in Polish.[3] In 2003, a number of other search results clustering algorithms were added, including Lingo,[4] a novel text clustering algorithm designed specifically for clustering of search results. While the source code of Carrot² was available since 2002, it was only in 2006 when version 1.0 was officially released. In the same year, version 2.0 was released with improved user interface and extended tool set. In 2009, version 3.0 brought significant improvements in clustering quality, simplified API and new GUI application for tuning clustering based on the Eclipse Rich Client Platform. In 2020, version 4.0.0 brought further simplification of the API, code cleanups and removal of the desktop Workbench. Version 4.1.0 brings back the Workbench as a web-based application.

More information Release, Release Date ...

Architecture

Carrot² 4.0 is predominantly a Java programming library with public APIs for management of language-specific resources, algorithm configuration and execution. A HTTP/REST component (document clustering server) is provided for interoperability with other languages.

Clustering algorithms

Carrot² offers a few document clustering algorithms that place emphasis on the quality of cluster labels:

Spin-offs

Carrot Search,[7] a commercial spin-off of the Carrot² project, works on further development of Carrot², offers a real-time text clustering algorithm[8] compliant with the Carrot² framework as well as text mining consulting services based on open source and proprietary software.

Carrot Search Labs

Carrot² gave rise to a number of independent open source projects released under the umbrella of Carrot Search Labs.[9] The following projects are or were published as part of this initiative:

  • Randomized Testing: a JUnit test runner with built-in utilities to make every test run slightly different (randomized). Also an ANT task for running JUnit tests on parallel JVMs, with load balancing and other bells and whistles.
  • High Performance Primitive Collections for Java (HPPC): Lists, Sets, Maps and other collections of primitives for Java tuned for highest performance and memory efficiency.
  • SmartSprites: fully automatic maintenance of CSS sprites; no tedious copying and pasting to the CSS when adding or changing sprited images.

Discontinued projects:

  • jSuffixArrays: Several Java implementations of the Suffix Array data structure with different performance and memory characteristics.
  • JUnitBenchmarks: A set of extensions for turning JUnit4 tests into performance micro-benchmarks with GC monitoring, time variance measurement and simple graphical visualizations.

See also


References

  1. Carrot2 Project, Stanislaw Osinski, Dawid Weiss. "Carrot2 - Open Source Search Results Clustering Engine".{{cite web}}: CS1 maint: multiple names: authors list (link) CS1 maint: numeric names: authors list (link)
  2. Dawid Weiss: A Clustering Interface for Web Search Results in Polish and English. MSc thesis. Poznan University of Technology, Poznań, Poland, 2001 download PDF
  3. Stanisław Osiński, Dawid Weiss: A Concept-Driven Algorithm for Clustering Search Results. IEEE Intelligent Systems, May/June, 3 (vol. 20), 2005, pp. 4854.
  4. Oren Zamir, Oren Etzioni: Web Document Clustering: A Feasibility Demonstration, Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval (1998), pp. 4654
  5. Carrot Search s.c. "Carrot Search Labs".

Share this article:

This article uses material from the Wikipedia article Carrot2, and is written by contributors. Text is available under a CC BY-SA 4.0 International License; additional terms may apply. Images, videos and audio are available under their respective licenses.