Firehose
Troubleshooting

Why am I not getting matches?

The usual reasons a stream or feed stays empty — crawl coverage, url: filters, the quality filter, the recency window, a too-strict query, or no credit.

View as Markdown

An empty stream almost always comes down to one of a few causes. Work down this list in order — they're roughly most to least common.

The crawler hasn't reached matching pages yet

Firehose matches pages as the crawler crawls them — it doesn't poll or fetch on demand. A brand-new rule only starts matching pages crawled after it's active, and a narrow topic may simply not have been crawled yet. This is normal, not a failure.

Fix: give it time. A new rule only matches pages crawled after it goes active, so there's nothing to replay yet — come back later and use since (e.g. ?since=24h) to see what matched while you were away. If you suspect the rule is too narrow, widen it by dropping extra clauses.

A url: rule filters pages; it doesn't fetch them

This is the single most common misunderstanding. A url: clause filters which crawled pages match — it does not tell Firehose to crawl that URL. If the crawler hasn't re-crawled the page, a change to it never reaches your stream, no matter how precise the url: filter is.

Fix: to watch specific known URLs on a cadence you control, use URL Watch.

The quality filter is dropping your results

Rules have quality on by default. It limits results to pages published in the last 7 days and drops pagination, tag/category index pages, and URLs with query parameters. On an older or index-heavy source, that can remove everything.

Fix: set quality: false on the rule to receive everything that matches. See Rules & query syntax.

curl -s -X PUT https://api.firehose.com/v1/rules/1 \
  -H "Authorization: Bearer $FIREHOSE_TAP_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"quality": false}'

The recency window is too narrow

recent:1h on a low-volume topic can match nothing for long stretches.

Fix: widen the window (recent:24h, recent:7d) while you're confirming the rule works, then tighten it once matches are flowing.

The query is stricter than you think

A few query behaviors silently exclude pages (see Rules & query syntax):

  • Keyword fields are case-sensitive. url, domain, page_category, page_type, and language match an exact token — language:"EN" won't match en. Text fields like title and added are lowercased, so they aren't case-sensitive.
  • added is the default field. A bare term like tesla matches text from inserted content, not the whole page. Use title:tesla to match titles.
  • An exclude-mode domain/URL list, or org-wide excluded domains, may be removing your sources. Check what's attached to the tap.

The organization is out of credit

If the prepaid balance is exhausted, new stream connections return 402 and no matches are delivered.

Fix: top up. See How billing works.

Next steps