Skip to main content
Infino keeps your data as Apache Parquet and builds its search indexes right into those files. You search it with full-text, vector, SQL, or a hybrid of all three, and it runs inside your application.
       write rows   ──►  ┌─────────────── supertable ───────────────┐
  (text · vectors ·      │  manifest: snapshot reads, atomic commits │
       scalars)          │                                           │
                         │    superfile ── superfile ── superfile    │
                         │      each = one Apache Parquet file with   │
                         │      BM25 + vector (IVF + RaBitQ) indexes  │
                         └──────────────────┬────────────────────────┘
                                            │  on S3 / Azure / local disk

              query the same rows from your own process:
      full-text (BM25) · vector kNN · SQL · hybrid (BM25 + vector, fused with RRF)

The two layers

  • Superfile: a single Apache Parquet file with embedded BM25 and vector (IVF + RaBitQ) indexes spliced in. It is immutable once written, and still a valid Parquet file, so anything that reads Parquet can read your data.
  • Supertable: many superfiles composed into one queryable table, with a manifest that provides snapshot-isolated reads and atomic commits. Writes are append-only, and update / delete are handled without rewriting your data.

One copy, four query modes

Index your rows once, then retrieve them however the question needs:
  • Full-text (BM25): keyword search.
  • Vector (kNN): semantic search over your embeddings (bring your own).
  • SQL: filter, aggregate, and join over the same rows.
  • Hybrid: BM25 and vector fused in a single query with reciprocal-rank fusion (RRF).
All four run over one copy of the data, inside your application.

Object-storage-native retrieval

Object-storage-native retrieval is search that runs directly on data kept in object storage (Amazon S3, Azure Blob, or local disk), instead of in a database or search cluster that owns its own copy. The index and the data live as ordinary files on the object store, and queries read just the bytes they need. This matters because it breaks the usual coupling of compute and storage:
  • Storage is cheap and elastic. Data sits in object storage at object-storage prices, with no replication factor multiplying your footprint.
  • Compute is stateless. Any process can open the data and serve a query; there is no cluster to keep warm between queries.
  • One copy, open format. The files are standard Parquet, so the same bytes that serve search also serve analytics, with no second system to keep in sync.
That is decisive for agent and RAG workloads, where an agent issues many retrievals per task: when each retrieval is cheap and the storage bill is flat, latency and cost work in your favour.

How a query runs

Object storage has high first-byte latency, so Infino is built to read only what a query needs. A query goes through three steps before it touches your data:
  1. Pin a snapshot. The query starts from a fixed view of the table, so concurrent writes can’t change its answer mid-flight.
  2. Prune from the manifest. Each file carries small summaries (value ranges, a keyword “is this term present?” filter, and vector centroids). The query reads just those summaries to skip files that can’t match, before fetching any file contents. The same summaries cover scalar, keyword, and vector signals, so a hybrid query prunes on all three together.
  3. Fetch only the bytes that survive. For the files left, Infino pulls just the relevant byte ranges, such as a posting list or a handful of vector clusters, and caches them. A cold first touch pays the object-store round trip once; warm queries run from a local memory-mapped cache.
Because the index sits in the same Parquet file as the data it describes, resolving a match doesn’t need a round trip to a separate index service.

Snapshots and freshness

Every query runs against a pinned snapshot of the table, and new data becomes visible all at once at the next commit. A commit stages the appended rows, builds them into new superfiles (each with its indexes), then publishes a successor manifest atomically. Nothing new is visible until that publish, and there is no half-applied state in between. A long-running query keeps reading its original snapshot even as later commits land, so its results stay consistent from start to finish.

Dig deeper

For the exhaustive, code-level internals (the superfile format, the supertable manifest, and the query layers), see Infino on DeepWiki. Performance numbers (for example a warm single-term BM25 query in the microsecond range on a 1M-document index) are in benches/README.md. For where Infino fits and where it doesn’t, see Tradeoffs.
Last modified on June 29, 2026