SiFR Format

A structured format for what a page actually is at runtime

Not a screenshot, not raw HTML, not an accessibility tree. SiFR is a compact structured representation of a live rendered page — the elements, their text and roles, where they sit, and how they relate — captured after the page has actually rendered.

The problem: three lossy ways to show a page to an LLM

LLMs consume web pages three ways today, and each fails differently.

Raw HTML is bloated and full of non-semantic noise — wrappers, inline scripts, framework attributes — most of which has nothing to do with what the user sees.
Screenshots burn vision tokens and lose element identity: the model can see a button, but it cannot address it — there is no selector to click.
Accessibility trees drop visual context and layout, and are inconsistently populated across sites and frameworks.

What SiFR is

SiFR captures the rendered page as a graph. Each node carries its text, role, selectors, bounding box, and computed style signals; relationships between nodes capture containment and spatial or structural clusters. Metadata records capture stats — node counts, salience distribution, and final payload size.

Current format version: v3 — captures declare format: "sifr-v3". Typical captures are in the tens-to-hundreds of KB range — small enough to hand to a model as context, structured enough for it to reason over and address individual elements.

// Shape of a SiFR capture (simplified — conveys structure, not verbatim output)
{
  "====METADATA====": {
    "format": "sifr-v3", "url": "https://example.com/page",
    "stats": { "totalNodes": 294, "salience": { "high": 41 }, "finalSize": 190464 }
  },
  "====NODES====": {
    "n17": { "text": "Add to cart", "role": "button",
             "selector": "#buy", "bbox": [412,220,96,36], "salience": "high" }
  },
  "====RELATIONS====": [ { "type": "contains", "from": "n4", "to": "n17" } ]
}

Worked example

All three captures below are real SiFR artifacts. The first is a page everyone knows — the Wikipedia main page — and the two after it show how the format holds as pages get smaller and heavier.

A familiar page: the Wikipedia main page

Captured live, after render. SiFR reduced the full rendered page to a structured payload a model can read and address:

900 nodes captured from the rendered page
16 spatial clusters grouping related elements
201 KB total SiFR payload — well under the 400 KB cap
salience ranked 7 high, 26 medium, 300 low — the model gets the whole page but knows what matters

The seven high-salience elements are exactly what you would point a person to first — the search box and the masthead. Each keeps a real selector and its on-screen bounding box:

// ====SUMMARY====.elements — the 7 elements ranked "high" salience.
// Verbatim selectors, text, and on-screen [x,y,w,h] boxes (style fields elided).
[
  { "id": "#searchInput", "tag": "input", "text": "",                "bbox": [266,17,405,32] },  // placeholder: Search Wikipedia
  { "id": "#mwBg",       "tag": "a",     "text": "Wikipedia",       "bbox": [662,157,112,29] },
  { "id": "#mwBw",       "tag": "a",     "text": "free",            "bbox": [531,198,26,17] },
  { "id": "#mwCA",       "tag": "a",     "text": "anyone can edit", "bbox": [684,198,107,17] },
  { "id": "#mwCw",       "tag": "a",     "text": "257,609",         "bbox": [493,223,49,15] },
  { "id": "#mwDg",       "tag": "a",     "text": "7,204,473",       "bbox": [638,223,61,15] },
  { "id": "#mwEA",       "tag": "a",     "text": "English",         "bbox": [763,223,45,15] }
]

That is the Wikipedia masthead, reconstructed from the capture alone — Wikipedia, the free encyclopedia that anyone can edit, 7,204,473 articles in English — every fragment addressable by a real selector the model can then act on.

A compact page

A captured county-council services page (CCEA) — a small page the model has never seen:

294 nodes captured from the rendered page
22 clusters grouping related elements spatially and structurally
186 KB total SiFR payload

A heavy page

The public Elastic Public Roadmap board on GitHub — a busy, interactive board at the heavy end of the range:

1,480 nodes captured from the rendered board
11 clusters (spatial and structural)
a few hundred KB of SiFR, held under the capture size cap

Payload scales with page complexity, and the capture size cap bounds it — so even a heavy page stays within a few hundred KB. The format holds across the range: the same node, role, selector, and bounding-box structure, and the same clustering, whatever the page.

How it compares

	Raw HTML	Screenshot	A11y tree	SiFR
Size	Very large	Large (vision tokens)	Small	Small
Element addressing	Yes (noisy)	No	Partial	Yes
Visual / layout info	No	Yes	No	Yes (bbox + style)
Post-JS runtime state	Partial	Yes	Partial	Yes

SiFR and E2LLM MCP

SiFR is the payload; E2LLM MCP is the transport. A capture is what the sifr_capture tool returns — the structured representation the model then reads and reasons over. The extension and the MCP relay both speak SiFR.

Structured browser perception — the category, and why structural perception matters
Runtime Snapshots — the article series behind the format