SiFR Format

A structured format for what a page actually is at runtime

Not a screenshot, not raw HTML, not an accessibility tree. SiFR is a compact structured representation of a live rendered page — the elements, their text and roles, where they sit, and how they relate — captured after the page has actually rendered.

The problem: three lossy ways to show a page to an LLM

LLMs consume web pages three ways today, and each fails differently.

What SiFR is

SiFR captures the rendered page as a graph. Each node carries its text, role, selectors, bounding box, and computed style signals; relationships between nodes capture containment and spatial or structural clusters. Metadata records capture stats — node counts, salience distribution, and final payload size.

Current format version: v3 — captures declare format: "sifr-v3". Typical captures are in the tens-to-hundreds of KB range — small enough to hand to a model as context, structured enough for it to reason over and address individual elements.

// Shape of a SiFR capture (simplified — conveys structure, not verbatim output) { "====METADATA====": { "format": "sifr-v3", "url": "https://example.com/page", "stats": { "totalNodes": 294, "salience": { "high": 41 }, "finalSize": 190464 } }, "====NODES====": { "n17": { "text": "Add to cart", "role": "button", "selector": "#buy", "bbox": [412,220,96,36], "salience": "high" } }, "====RELATIONS====": [ { "type": "contains", "from": "n4", "to": "n17" } ] }

Worked example

All three captures below are real SiFR artifacts. The first is a page everyone knows — the Wikipedia main page — and the two after it show how the format holds as pages get smaller and heavier.

A familiar page: the Wikipedia main page

Captured live, after render. SiFR reduced the full rendered page to a structured payload a model can read and address:

The seven high-salience elements are exactly what you would point a person to first — the search box and the masthead. Each keeps a real selector and its on-screen bounding box:

// ====SUMMARY====.elements — the 7 elements ranked "high" salience. // Verbatim selectors, text, and on-screen [x,y,w,h] boxes (style fields elided). [ { "id": "#searchInput", "tag": "input", "text": "", "bbox": [266,17,405,32] }, // placeholder: Search Wikipedia { "id": "#mwBg", "tag": "a", "text": "Wikipedia", "bbox": [662,157,112,29] }, { "id": "#mwBw", "tag": "a", "text": "free", "bbox": [531,198,26,17] }, { "id": "#mwCA", "tag": "a", "text": "anyone can edit", "bbox": [684,198,107,17] }, { "id": "#mwCw", "tag": "a", "text": "257,609", "bbox": [493,223,49,15] }, { "id": "#mwDg", "tag": "a", "text": "7,204,473", "bbox": [638,223,61,15] }, { "id": "#mwEA", "tag": "a", "text": "English", "bbox": [763,223,45,15] } ]

That is the Wikipedia masthead, reconstructed from the capture alone — Wikipedia, the free encyclopedia that anyone can edit, 7,204,473 articles in English — every fragment addressable by a real selector the model can then act on.

A compact page

A captured county-council services page (CCEA) — a small page the model has never seen:

A heavy page

The public Elastic Public Roadmap board on GitHub — a busy, interactive board at the heavy end of the range:

Payload scales with page complexity, and the capture size cap bounds it — so even a heavy page stays within a few hundred KB. The format holds across the range: the same node, role, selector, and bounding-box structure, and the same clustering, whatever the page.

How it compares

 Raw HTMLScreenshotA11y treeSiFR
SizeVery largeLarge (vision tokens)SmallSmall
Element addressingYes (noisy)NoPartialYes
Visual / layout infoNoYesNoYes (bbox + style)
Post-JS runtime statePartialYesPartialYes

SiFR and E2LLM MCP

SiFR is the payload; E2LLM MCP is the transport. A capture is what the sifr_capture tool returns — the structured representation the model then reads and reasons over. The extension and the MCP relay both speak SiFR.

Related