# Rules & query syntax

> Create, read, update, and delete the queries attached to a tap — and the full query language they're written in.

A rule is a query attached to a tap, with an optional `tag` label. A page is delivered to the tap's
stream if it matches **any** rule on the tap. All rule endpoints authenticate with a **tap token**
(`fh_`).

## Rule object

| Field | Type | Description |
| --- | --- | --- |
| `id` | string | Rule identifier |
| `value` | string | The query (required) |
| `tag` | string | Optional label, max 255 chars |
| `nsfw` | boolean | Include adult content. Default `false` |
| `quality` | boolean | Apply quality filters. Default `true` |

## List rules

```bash
curl -s https://api.firehose.com/v1/rules \
  -H "Authorization: Bearer $FIREHOSE_TAP_TOKEN"
```

```json
{
  "data": [
    { "id": "1", "value": "tesla", "tag": "brand-mentions" },
    { "id": "2", "value": "\"site explorer\"", "tag": "product" }
  ],
  "meta": { "count": 2 }
}
```

## Create a rule

```bash
curl -s -X POST https://api.firehose.com/v1/rules \
  -H "Authorization: Bearer $FIREHOSE_TAP_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"value": "tesla OR \"electric vehicle\"", "tag": "ev"}'
```

Returns `201` with the created rule.

## Update a rule

Partial updates are supported.

```bash
curl -s -X PUT https://api.firehose.com/v1/rules/1 \
  -H "Authorization: Bearer $FIREHOSE_TAP_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"tag": "new-tag", "nsfw": true}'
```

## Delete a rule

```bash
curl -s -X DELETE https://api.firehose.com/v1/rules/1 \
  -H "Authorization: Bearer $FIREHOSE_TAP_TOKEN"
```

Returns `204` with no content.

---

## Query syntax

A rule's `value` is written in **Firehose query syntax**, which is **Lucene-compatible**. Queries are
evaluated against indexed fields extracted from each crawled page.

### Indexed fields

| Field | Type | Case | Description |
| --- | --- | --- | --- |
| `added` | text | insensitive | **Default field.** Text from inserted diff chunks |
| `removed` | text | insensitive | Text from deleted diff chunks |
| `added_anchor` | text | insensitive | Anchor text from inserted links |
| `removed_anchor` | text | insensitive | Anchor text from deleted links |
| `title` | text | insensitive | Page title |
| `url` | keyword | sensitive | Full URL as one exact token |
| `domain` | keyword | sensitive | Domain extracted from the URL |
| `publish_time` | keyword | sensitive | ISO-8601 local datetime |
| `page_category` | keyword | sensitive | ML category label, e.g. `/News` |
| `page_type` | keyword | sensitive | ML type label, e.g. `/Article/How_to` |
| `language` | keyword | sensitive | ISO 639-1 code, e.g. `en`, `fr`, `zh-cn` |
| `recent` | filter | — | Recency filter (see below) |

**Text** fields are tokenized and lowercased (case-insensitive). **Keyword** fields are stored as a
single exact, case-sensitive token. Null/empty fields are absent and never match. Multi-valued
fields match if **any** value matches.

### Terms and phrases

```text
tesla                        # "tesla" anywhere in added content (default field)
title:tesla                  # "tesla" in the title
"quick brown fox"            # exact phrase in content
title:"breaking news"        # exact phrase in title
```

### Boolean operators

```text
java AND programming
title:tesla OR added:"electric vehicle"
NOT malware
title:tesla AND added:earnings
removed:"old feature"        # term appeared in deleted content
```

### URL and domain filtering

`url` and `domain` are exact, case-sensitive tokens. You can match them three ways: exact, wildcard
(`*`, `?`), and regex (`/pattern/`). Forward slashes are special and must be escaped with `\`.

```text
url:"https://example.com/news/article-1"   # exact
domain:techcrunch.com                       # exact domain
url:*\/category\/*                          # wildcard: contains /category/
url:/.*\/page\/[0-9]+.*/                     # regex: pagination URLs
```

Excluding junk URLs is the most common pattern:

```text
title:tesla AND language:"en"
  AND NOT url:/.*\/page\/[0-9]+.*/
  AND NOT url:*\/category\/*
  AND NOT url:*\/tag\/*
```

<Callout type="warning">
  **JSON double-escaping.** In a JSON request body, `\/` is just `/`. To send a literal backslash
  before a slash in the query, write `\\/` in JSON. For example the query `url:*\/abs\/*`
  must be sent as `"url:*\\/abs\\/*"`.
</Callout>

<Callout type="warning">
  Filtering on `url` narrows which **crawled** pages match — it does not tell Firehose to crawl that
  URL. A tap only ever sees pages the crawler visits, on the crawler's own schedule, so a change to a
  specific page won't surface until (and unless) the crawler re-crawls it. To monitor a specific page
  for changes on a cadence you control, use [URL Watch](/url-watch/overview) instead.
</Callout>

### Date ranges on `publish_time`

Colons in timestamps must be escaped with `\\`:

```text
publish_time:[2025-01-01T00\\:00\\:00 TO 2025-12-31T23\\:59\\:59]   # inclusive
publish_time:{2025-01-01T00\\:00\\:00 TO 2025-12-31T23\\:59\\:59}   # exclusive
```

### `recent` — recency filter

A query-level filter (not an indexed field). Format: a positive integer followed by `h`, `d`, or `mo`.

```text
recent:1h                      # published in the last hour
recent:7d                      # last 7 days
title:tesla AND recent:24h     # tesla in title, last 24 hours
```

### `nsfw` — adult content

A boolean **on the rule object**, not in the query. `false` (default) excludes adult content;
`true` includes it.

```json
{ "value": "title:tesla", "nsfw": true }
```

### `quality` — quality filter

A boolean **on the rule object** (default `true`). When on, results are limited to pages published
in the last 7 days, with no pagination, tag/category index, or query-parameter URLs — removing
low-value and duplicate pages.

```json
{ "value": "domain:\"example.com\"", "quality": false }
```

### Category and type values

`page_category` and `page_type` accept a large fixed vocabulary (25 top-level categories with
700+ subcategories, and 110+ page types). The complete list lives in the canonical
[`/skill.md`](https://docs.firehose.com/skill.md) reference.
A few examples:

```text
page_category:"/News"
page_category:"/Sports/Winter_Sports/Skiing_and_Snowboarding"
page_type:"/Article/How_to"
page_type:"/Document/White_Paper"
```

## Next steps

<CardGrid>
  <Card title="Streaming (SSE)" href="/stream/streaming">
    Open the connection and receive matches from your rules.
  </Card>
  <Card title="Match payload" href="/stream/match-payload">
    Every field on a delivered document.
  </Card>
  <Card title="Filters & domain lists" href="/dashboard/filters">
    Save a fragment once and reuse it across many rules.
  </Card>
</CardGrid>
