> ## Documentation Index
> Fetch the complete documentation index at: https://docs.infino.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Core concepts

> How Infino works. One copy of your data as Apache Parquet with built-in search indexes, queryable four ways (full-text, vector, SQL, hybrid) in-process.

Infino keeps your data as Apache Parquet and builds its search indexes right into those
files. You search it with full-text, vector, SQL, or a hybrid of all three, and it runs
inside your application.

```text theme={null}
       write rows   ──►  ┌─────────────── supertable ───────────────┐
  (text · vectors ·      │  manifest: snapshot reads, atomic commits │
       scalars)          │                                           │
                         │    superfile ── superfile ── superfile    │
                         │      each = one Apache Parquet file with   │
                         │      BM25 + vector (IVF + RaBitQ) indexes  │
                         └──────────────────┬────────────────────────┘
                                            │  on S3 / Azure / local disk
                                            ▼
              query the same rows from your own process:
      full-text (BM25) · vector kNN · SQL · hybrid (BM25 + vector, fused with RRF)
```

## The two layers

* **Superfile**: a single Apache Parquet file with embedded BM25 and vector
  (IVF + RaBitQ) indexes spliced in. It is immutable once written, and still a valid
  Parquet file, so anything that reads Parquet can read your data.
* **Supertable**: many superfiles composed into one queryable table, with a manifest that
  provides snapshot-isolated reads and atomic commits. Writes are append-only, and
  `update` / `delete` are handled without rewriting your data.

## One copy, four query modes

Index your rows once, then retrieve them however the question needs:

* **Full-text (BM25)**: keyword search.
* **Vector (kNN)**: semantic search over your embeddings (bring your own).
* **SQL**: filter, aggregate, and join over the same rows.
* **Hybrid**: BM25 and vector fused in a single query with reciprocal-rank fusion (RRF).

All four run over one copy of the data, inside your application.

## Object-storage-native retrieval

Object-storage-native retrieval is search that runs directly on data kept in object
storage (Amazon S3, Azure Blob, or local disk), instead of in a database or search
cluster that owns its own copy. The index and the data live as ordinary files on the
object store, and queries read just the bytes they need.

This matters because it breaks the usual coupling of compute and storage:

* **Storage is cheap and elastic.** Data sits in object storage at object-storage prices,
  with no replication factor multiplying your footprint.
* **Compute is stateless.** Any process can open the data and serve a query; there is no
  cluster to keep warm between queries.
* **One copy, open format.** The files are standard Parquet, so the same bytes that serve
  search also serve analytics, with no second system to keep in sync.

That is decisive for agent and RAG workloads, where an agent issues many retrievals per
task: when each retrieval is cheap and the storage bill is flat, latency and cost work in
your favour.

## How a query runs

Object storage has high first-byte latency, so Infino is built to read only what a query
needs. A query goes through three steps before it touches your data:

1. **Pin a snapshot.** The query starts from a fixed view of the table, so concurrent
   writes can't change its answer mid-flight.
2. **Prune from the manifest.** Each file carries small summaries (value ranges, a
   keyword "is this term present?" filter, and vector centroids). The query reads just
   those summaries to skip files that can't match, before fetching any file contents. The
   same summaries cover scalar, keyword, and vector signals, so a hybrid query prunes on
   all three together.
3. **Fetch only the bytes that survive.** For the files left, Infino pulls just the
   relevant byte ranges, such as a posting list or a handful of vector clusters, and
   caches them. A cold first touch pays the object-store round trip once; warm queries run
   from a local memory-mapped cache.

Because the index sits in the same Parquet file as the data it describes, resolving a
match doesn't need a round trip to a separate index service.

## Snapshots and freshness

Every query runs against a pinned snapshot of the table, and new data becomes visible all
at once at the next commit. A **commit** stages the appended rows, builds them into new
superfiles (each with its indexes), then publishes a successor manifest atomically.
Nothing new is visible until that publish, and there is no half-applied state in between.
A long-running query keeps reading its original snapshot even as later commits land, so
its results stay consistent from start to finish.

## Dig deeper

For the exhaustive, code-level internals (the superfile format, the supertable manifest,
and the query layers), see [Infino on DeepWiki](https://deepwiki.com/infino-ai/infino).
Performance numbers (for example a warm single-term BM25 query in the microsecond range
on a 1M-document index) are in
[`benches/README.md`](https://github.com/infino-ai/infino/blob/main/benches/README.md).
For where Infino fits and where it doesn't, see [Tradeoffs](/tradeoffs).