interfaces – Core gensim interfaces

This module contains basic interfaces used throughout the whole gensim package.

The interfaces are realized as abstract base classes (ie., some optional functionality is provided in the interface itself, so that the interfaces can be subclassed).

class gensim.interfaces.CorpusABC

Interface for corpora. A corpus is simply an iterable, where each iteration step yields one document. A document is a list of (fieldId, fieldValue) 2-tuples.

See the corpora package for some example corpus implementations.

Note that although a default len() method is provided, it is very inefficient (performs a linear scan through the corpus to determine its length). Wherever the corpus size is needed and known in advance (or at least doesn’t change so that it can be cached), the len() method should be overridden.

classmethod load(fname)
Load a previously saved object from file (also see save).
save(fname)
Save the object to file via pickling (also see load).
class gensim.interfaces.SimilarityABC(corpus, numBest=None)

Abstract interface for similarity searches over a corpus.

In all instances, there is a corpus against which we want to perform the similarity search.

For similarity search, the input is a document and the output are its similarities to individual corpus documents.

Similarity queries are realized by calling self[query_document].

There is also a convenience wrapper, where iterating over self yields similarities of each document in the corpus against the whole corpus (ie., the query is each corpus document in turn).

Initialize the similarity search.

If numBest is left unspecified, similarity queries return a full list (one float for every document in the corpus, including the query document):

If numBest is set, queries return numBest most similar documents, as a sorted list:

>>> sms = SparseMatrixSimilarity(corpus, numBest = 3)
>>> sms[vec] # result in order of decreasing similarity
[(12, 1.0), (30, 0.95), (5, 0.45)]
getSimilarities(doc)

Return similarity of a sparse vector doc to all documents in the corpus.

The document is assumed to be either of unit length or empty.

classmethod load(fname)
Load a previously saved object from file (also see save).
save(fname)
Save the object to file via pickling (also see load).
class gensim.interfaces.TransformationABC

Interface for transformations. A ‘transformation’ is any object which accepts a sparse document via the dictionary notation [] and returns another sparse document in its stead.

See the gensim.models.tfidfmodel module for an example of a transformation.

classmethod load(fname)
Load a previously saved object from file (also see save).
save(fname)
Save the object to file via pickling (also see load).

Previous topic

API Reference

Next topic

utils – Various utility functions