FastTextIndexer¶
-
class
finalfusion.subword.hash_indexers.
FastTextIndexer
(n_buckets=2000000, min_n=3, max_n=6)¶ File: src/finalfusion/subword/hash_indexers.pyx (starting at line 155)
FastTextIndexer
FastTextIndexer is a hash-based subword indexer. It hashes n-grams with (a slightly faulty) FNV-1a variant and maps the hash to a predetermined bucket space.
N-grams can be indexed directly through the __call__ method or all n-grams in a string can be indexed in bulk through the subword_indices method.
-
max_n
¶ The upper bound of the n-gram range.
- Returns
max_n – Upper bound of n-gram range
- Return type
-
min_n
¶ The lower bound of the n-gram range.
- Returns
min_n – Lower bound of n-gram range
- Return type
-
subword_indices
(self, unicode word, uint64_t offset=0, bracket=True, with_ngrams=False)¶ File: src/finalfusion/subword/hash_indexers.pyx (starting at line 219)
Get the subword indices for a word.
- Parameters
word (str) – The string to extract n-grams from
offset (int) – The offset to add to the index, e.g. the length of the word-vocabulary.
bracket (bool) – Toggles bracketing the input string with < and >
with_ngrams (bool) – Toggles returning tuples of (ngram, idx)
- Returns
indices – List of n-gram indices, optionally as (str, int) tuples.
- Return type
- Raises
TypeError – If word is None.
-