Rules & query syntax
Create, read, update, and delete the queries attached to a tap — and the full query language they're written in.
A rule is a query attached to a tap, with an optional tag label. A page is delivered to the tap's
stream if it matches any rule on the tap. All rule endpoints authenticate with a tap token
(fh_).
Rule object
| Field | Type | Description |
|---|---|---|
id | string | Rule identifier |
value | string | The query (required) |
tag | string | Optional label, max 255 chars |
nsfw | boolean | Include adult content. Default false |
quality | boolean | Apply quality filters. Default true |
List rules
curl -s https://api.firehose.com/v1/rules \
-H "Authorization: Bearer $FIREHOSE_TAP_TOKEN"{
"data": [
{ "id": "1", "value": "tesla", "tag": "brand-mentions" },
{ "id": "2", "value": "\"site explorer\"", "tag": "product" }
],
"meta": { "count": 2 }
}Create a rule
curl -s -X POST https://api.firehose.com/v1/rules \
-H "Authorization: Bearer $FIREHOSE_TAP_TOKEN" \
-H "Content-Type: application/json" \
-d '{"value": "tesla OR \"electric vehicle\"", "tag": "ev"}'Returns 201 with the created rule.
Update a rule
Partial updates are supported.
curl -s -X PUT https://api.firehose.com/v1/rules/1 \
-H "Authorization: Bearer $FIREHOSE_TAP_TOKEN" \
-H "Content-Type: application/json" \
-d '{"tag": "new-tag", "nsfw": true}'Delete a rule
curl -s -X DELETE https://api.firehose.com/v1/rules/1 \
-H "Authorization: Bearer $FIREHOSE_TAP_TOKEN"Returns 204 with no content.
Query syntax
A rule's value is written in Firehose query syntax, which is Lucene-compatible. Queries are
evaluated against indexed fields extracted from each crawled page.
Indexed fields
| Field | Type | Case | Description |
|---|---|---|---|
added | text | insensitive | Default field. Text from inserted diff chunks |
removed | text | insensitive | Text from deleted diff chunks |
added_anchor | text | insensitive | Anchor text from inserted links |
removed_anchor | text | insensitive | Anchor text from deleted links |
title | text | insensitive | Page title |
url | keyword | sensitive | Full URL as one exact token |
domain | keyword | sensitive | Domain extracted from the URL |
publish_time | keyword | sensitive | ISO-8601 local datetime |
page_category | keyword | sensitive | ML category label, e.g. /News |
page_type | keyword | sensitive | ML type label, e.g. /Article/How_to |
language | keyword | sensitive | ISO 639-1 code, e.g. en, fr, zh-cn |
recent | filter | — | Recency filter (see below) |
Text fields are tokenized and lowercased (case-insensitive). Keyword fields are stored as a single exact, case-sensitive token. Null/empty fields are absent and never match. Multi-valued fields match if any value matches.
Terms and phrases
tesla # "tesla" anywhere in added content (default field)
title:tesla # "tesla" in the title
"quick brown fox" # exact phrase in content
title:"breaking news" # exact phrase in titleBoolean operators
java AND programming
title:tesla OR added:"electric vehicle"
NOT malware
title:tesla AND added:earnings
removed:"old feature" # term appeared in deleted contentURL and domain filtering
url and domain are exact, case-sensitive tokens. You can match them three ways: exact, wildcard
(*, ?), and regex (/pattern/). Forward slashes are special and must be escaped with \.
url:"https://example.com/news/article-1" # exact
domain:techcrunch.com # exact domain
url:*\/category\/* # wildcard: contains /category/
url:/.*\/page\/[0-9]+.*/ # regex: pagination URLsExcluding junk URLs is the most common pattern:
title:tesla AND language:"en"
AND NOT url:/.*\/page\/[0-9]+.*/
AND NOT url:*\/category\/*
AND NOT url:*\/tag\/*JSON double-escaping. In a JSON request body, \/ is just /. To send a literal backslash
before a slash in the query, write \\/ in JSON. For example the query url:*\/abs\/*
must be sent as "url:*\\/abs\\/*".
Filtering on url narrows which crawled pages match — it does not tell Firehose to crawl that
URL. A tap only ever sees pages the crawler visits, on the crawler's own schedule, so a change to a
specific page won't surface until (and unless) the crawler re-crawls it. To monitor a specific page
for changes on a cadence you control, use URL Watch instead.
Date ranges on publish_time
Colons in timestamps must be escaped with \\:
publish_time:[2025-01-01T00\\:00\\:00 TO 2025-12-31T23\\:59\\:59] # inclusive
publish_time:{2025-01-01T00\\:00\\:00 TO 2025-12-31T23\\:59\\:59} # exclusiverecent — recency filter
A query-level filter (not an indexed field). Format: a positive integer followed by h, d, or mo.
recent:1h # published in the last hour
recent:7d # last 7 days
title:tesla AND recent:24h # tesla in title, last 24 hoursnsfw — adult content
A boolean on the rule object, not in the query. false (default) excludes adult content;
true includes it.
{ "value": "title:tesla", "nsfw": true }quality — quality filter
A boolean on the rule object (default true). When on, results are limited to pages published
in the last 7 days, with no pagination, tag/category index, or query-parameter URLs — removing
low-value and duplicate pages.
{ "value": "domain:\"example.com\"", "quality": false }Category and type values
page_category and page_type accept a large fixed vocabulary (25 top-level categories with
700+ subcategories, and 110+ page types). The complete list lives in the canonical
/skill.md reference.
A few examples:
page_category:"/News"
page_category:"/Sports/Winter_Sports/Skiing_and_Snowboarding"
page_type:"/Article/How_to"
page_type:"/Document/White_Paper"Next steps
Streaming (SSE)
Open the connection and receive matches from your rules.
Match payload
Every field on a delivered document.
Filters & domain lists
Save a fragment once and reuse it across many rules.