caterpillar.searching package¶
Subpackages¶
Module Contents¶
This module exposes the IndexSearcher, which allows searching an index for text frames.
- class caterpillar.searching.IndexSearcher(index_reader, scorer_cls=<class 'caterpillar.searching.scoring.TfidfScorer'>)¶
Bases: object
Allows searching for text frames within the specified index_reader. Accepts a custom scorer_cls for use in ranking search results (defaults to tf-idf). Scorer must be of type Scorer.
All searching operations expect an object of type BaseQuery.
The count and filter methods expose the most efficient search operations. The search method must score and rank all of its results, so should only be used when interested in the ranking of results.
- filter(query)¶
Return a list of ids for frames that match the specified query (must be of type BaseQuery).
- search(query, start=0, limit=25)¶
Return ranked frame data for frames that match the specified (must be of type BaseQuery).
Note that the ranking of results is performed by a Scorer that is initialised when the IndexSearcher is created.
start and limit define pagination of results, which defaults to the first 25 frames.
caterpillar.searching.results module¶
Classes to store search results.
caterpillar.searching.scoring module¶
- class caterpillar.searching.scoring.Scorer(index)¶
Bases: object
Scorers calculate a numerical score for query hits to rank them by.
- score_and_rank(hits, term_weights)¶
Scorer each of the specified hits and return them in ranked order.
Required Arguments: hits – A list of SearchrHit objects. term_weights – A list of term weights to use in scoring.
- class caterpillar.searching.scoring.SimpleScorer(index)¶
Bases: caterpillar.searching.scoring.Scorer
Simple scorer implementation to be used by IndexSearcher.
- score_and_rank(hits, term_weights)¶
Simply score hits by the presence of query terms and their weighting.
- class caterpillar.searching.scoring.TfidfScorer(index)¶
Bases: caterpillar.searching.scoring.Scorer
A scorer that uses TF-IDF.
- score_and_rank(hits, term_weights)¶
Score hits and return in ranked order according to TF-IDF.