Finalfusion in Python¶
finalfusion is a Python package for reading, writing and using
finalfusion embeddings, but also supports other commonly used
embeddings like fastText, GloVe and word2vec.
The Python package supports the same types of embeddings as the finalfusion-rust crate:
Vocabulary
No subwords
Subwords
Embedding matrix
Array
Memory-mapped
Quantized
Norms
Metadata
This package extends (de-)serialization capabilities of finalfusion Chunks by
allowing loading and writing single chunks. E.g. a Vocab can be loaded from a
finalfusion spec file without loading the
Storage. Single chunks can also be serialized to their own files
through write(). This is different from the functionality of finalfusion-rust,
loading stand-alone components is only supported by the Python package. Reading will fail with
other tools from the finalfusion ecosystem.
It integrates nicely with numpy since its Storage types can be
treated as numpy arrays.
finalfusion comes with some scripts to convert between
embedding formats, do analogy and similarity queries and turn bucket subword embeddings
into explicit subword embeddings.
The package is implemented in Python with some Cython extensions, it is not based on bindings
to the finalfusion-rust crate.