Finalfusion in Python

finalfusion is a Python package for reading, writing and using finalfusion embeddings, but also supports other commonly used embeddings like fastText, GloVe and word2vec.

The Python package supports the same types of embeddings as the finalfusion-rust crate:

  • Vocabulary

    • No subwords

    • Subwords

  • Embedding matrix

    • Array

    • Memory-mapped

    • Quantized

  • Norms

  • Metadata

This package extends (de-)serialization capabilities of finalfusion Chunks by allowing loading and writing single chunks. E.g. a Vocab can be loaded from a finalfusion spec file without loading the Storage. Single chunks can also be serialized to their own files through write(). This is different from the functionality of finalfusion-rust, loading stand-alone components is only supported by the Python package. Reading will fail with other tools from the finalfusion ecosystem.

It integrates nicely with numpy since its Storage types can be treated as numpy arrays.

finalfusion comes with some scripts to convert between embedding formats, do analogy and similarity queries and turn bucket subword embeddings into explicit subword embeddings.

The package is implemented in Python with some Cython extensions, it is not based on bindings to the finalfusion-rust crate.

Indices and tables