ExplicitIndexer¶
-
class
finalfusion.subword.explicit_indexer.
ExplicitIndexer
(ngrams: List[str], min_n: int = 3, max_n: int = 6, ngram_index: Optional[Dict[str, int]] = None)¶ File: src/finalfusion/subword/explicit_indexer.pyx (starting at line 9)
ExplicitIndexer
Explicit Indexers do not index n-grams through hashing but define an actual lookup table.
It can be constructed from a list of unique ngrams. In that case, the ith ngram in the list will be mapped to index i. It is also possible to pass a mapping via ngram_index which allows mapping multiple ngrams to the same value.
N-grams can be indexed directly through the __call__ method or all n-grams in a string can be indexed in bulk through the subword_indices method.
subword_indices optionally returns tuples of form (ngram, idx), otherwise a list of indices belonging to the input string is returned.
-
max_n
¶ The upper bound of the n-gram range.
- Returns
max_n – Upper bound of n-gram range
- Return type
-
min_n
¶ The lower bound of the n-gram range.
- Returns
min_n – Lower bound of n-gram range
- Return type
-
ngram_index
¶ Get the ngram-index mapping.
Note: If you mutate this mapping you can make the indexer invalid.
-
ngrams
¶ Get the list of n-grams.
Note: If you mutate this list you can make the indexer invalid.
- Returns
ngrams – The list of in-vocabulary n-grams.
- Return type
List[str]
-
subword_indices
(self, unicode word, uint64_t offset=0, bracket=True, with_ngrams=False)¶ File: src/finalfusion/subword/explicit_indexer.pyx (starting at line 129)
Get the subword indices for a word.
- Parameters
word (str) – The string to extract n-grams from
offset (int) – The offset to add to the index, e.g. the length of the word-vocabulary.
bracket (bool) – Toggles bracketing the input string with < and >
with_ngrams (bool) – Toggles returning tuples of (ngram, idx)
- Returns
indices – List of n-gram indices, optionally as (str, int) tuples.
- Return type
- Raises
TypeError – If word is None.
-