Finalfusion in Python¶
finalfusion
is a Python package for reading, writing and using
finalfusion embeddings, but also supports other commonly used
embeddings like fastText, GloVe and word2vec.
The Python package supports the same types of embeddings as the finalfusion-rust crate:
Vocabulary
No subwords
Subwords
Embedding matrix
Array
Memory-mapped
Quantized
Norms
Metadata
This package extends (de-)serialization capabilities of finalfusion
Chunk
s by
allowing loading and writing single chunks. E.g. a Vocab
can be loaded from a
finalfusion spec file without loading the
Storage
. Single chunks can also be serialized to their own files
through write()
. This is different from the functionality of finalfusion-rust
,
loading stand-alone components is only supported by the Python package. Reading will fail with
other tools from the finalfusion
ecosystem.
It integrates nicely with numpy
since its Storage
types can be
treated as numpy arrays.
finalfusion
comes with some scripts to convert between
embedding formats, do analogy and similarity queries and turn bucket subword embeddings
into explicit subword embeddings.
The package is implemented in Python with some Cython
extensions, it is not based on bindings
to the finalfusion-rust crate.