*.sf.parquet); the embedded BM25 and vector index regions sit ahead
of a standard Parquet footer and are referenced by inf.* file-metadata keys that a
conformant Parquet reader simply ignores. So anything that reads Parquet can read your
data: no export step, no Infino in the read path, no lock-in.
A single table can shard into several superfiles, so read them as a set (glob the
*.sf.parquet files under the table’s directory).
DuckDB
pandas / pyarrow
What a Parquet reader sees
A standard reader gets the_id column and your scalar / text columns, and ignores
the index regions. One thing to know: the vector column is consumed into the embedded
index, not stored as a Parquet column, so it won’t appear in the read-back (you’ll see
_id, source, body, but not embedding).
A raw Parquet read returns the rows as written to the superfiles. For Infino’s live
view (ranked search, and tables with
update / delete applied), query through
Infino (query_sql and the search API). Use direct Parquet reads for analytics,
ETL/export, and interop with the wider data ecosystem.Limitations
- Tombstones aren’t applied. A raw Parquet read returns rows as written;
update/deleteeffects show only through Infino. - The vector column isn’t a Parquet column. It’s consumed into the embedded index, so a standard reader won’t see it.
- Rewriting drops the indexes. Reading a superfile and rewriting it through a generic Parquet writer keeps the columns but loses searchability until re-indexed.
