If you have created some custom rules for your zone in the Cloudflare WAF, you can easily see the IP addresses and URI paths which have triggered those rules by simply selecting the rule from the Cloudflare dashboard. A key limitation, however, is that for free plans the data is only retained by Cloudflare for 24 hours, for the Pro or Business plans, 72 hours, and on an Enterprise plan, 30 days.
Many SecOps security professionals would like the ability to retain & analyze this data for an even longer period, to quantitatively determine trends, such as how often specific countries or ASNs are triggering their firewall rules and what types of attacks they might be attempting (brute force, XSS, SQL injection, etc.). Also, if your cybersecurity team pursues reporting abusive activity to the respective attackers’ network abuse contacts, retaining this data to illustrate a pattern of abuse can be helpful in substantiating your reports.
Fortunately, Cloudflare has a robust GraphQL API, which supports programmatically retrieving firewall events for your custom rules. It returns the data in a structured JSON format, which can be easily parsed by a Node.js function, and stored in a data warehouse such as Google BigQuery. Because the function only needs to run at scheduled time intervals (e.g. once every hour) to get the data from Cloudflare, and for the fact that we are using BigQuery to permanently store the data, the most elegant solution is to deploy the code as a Google Cloud Run Function.

Create a Custom Rule in the Cloudflare WAF
This example assumes that you already have a custom rule already in place in the Cloudflare WAF. For instance, you might have a rule named “Restricted Areas” which blocks any other IPs except for your authorized IPs from accessing the admin areas of your web application. In conjunction with authenticated origin pulls that causes your origin server to reject any requests other than from the Cloudflare reverse proxy, this is effective at blocking malicious traffic from attempting to brute force your login pages or exploit web application vulnerabilities.
For a WordPress website, the firewall rule expression might be something like:
(http.request.uri.path contains "/wp-login" and ip.src ne 152.67.XXX.XXX and ip.src ne 104.196.XXX.XXX) or (http.request.uri.path contains "/wp-admin" and not http.request.uri.path contains "/wp-admin/admin-post.php" and not http.request.uri.path contains "/wp-admin/admin-ajax.php" and ip.src ne 152.67.XXX.XXX and ip.src ne 104.196.XXX.XXX)
Select the action from “Block” from the dropdown, then click the “Deploy” button to propagate the rule.
Record the rule’s ruleset ID and zone ID
Select the rule from the Custom Rules list by clicking on the Name of the rule you just created. Take note of the string after the first forward slash (the ruleset ID) and after the last forward slash (the rule ID). You will require these later to query the GraphQL API for firewall events which triggered this rule.
https://dash.cloudflare.com/2a9cf95XXXXXXXXXXXXXXXXXXXXXXXXX/example.com/security/waf/custom-rules/7db4d36XXXXXXXXXXXXXXXXXXXXXXXXX
Generate a Cloudflare API token with appropriate permissions
From the Overview page of the Cloudflare dashboard for the zone, navigate to User API Tokens to create a Cloudflare API access token. The token requires the Zone WAF:Edit, Zone WAF:Read, Logs:Read, Firewall Services:Edit, Firewall Services:Read, Analytics:Read permissions.
Create a BigQuery dataset, table, and schema for the firewall events
From the Google Cloud console, navigate to the BigQuery service and create a new dataset (e.g. waf_events_cloudflare). The dataset should reside in the same region (e.g. us-west1) as the region where you plan to deploy the Cloud Function.
In the dataset, you will also need to create a table (e.g. firewall_events) and a schema for the table to house the incoming firewall event data. This is an example of SQL query which can create the schema for the types of data we want to retain (action, clientAsn, clientCountryName, clientIP, clientRequestPath, clientRequestQuery, datetime, ruleId, source, userAgent).
CREATE TABLE waf_events_cloudflare.firewall_events ( data STRUCT< errors STRING, data STRUCT< viewer STRUCT< zones ARRAY<STRUCT<firewallEventsAdaptive ARRAY<STRUCT< action STRING, clientAsn STRING, clientCountryName STRING, clientIP STRING, clientRequestPath STRING, clientRequestQuery STRING, datetime STRING, ruleId STRING, source STRING, userAgent STRING >>> >> >> );
Create a Pub/Sub queue with BigQuery as the destination
From the Pub/Sub service of the Google Cloud console, create a queue that pushes to BigQuery using the “write to BigQuery” delivery type. With the BigQuery subscriptions feature, it is not necessary to use Dataflow to stream data from Pub/Sub to BigQuery. Dataflow incurs an hourly charge regardless of the volume of data being streamed through it.
It is only recommended to use Dataflow from the data source to BigQuery if the data requires additional transformation as part of an ETL process. Here, this is not the case as the Cloudflare firewall event data is already formatted properly for analysis & reporting.
When creating the Pub/Sub subscription, indicate that the subscription should use the schema of the destination table, which we created in the earlier step.
Create a Cloud Function that calls the Cloudflare GraphQL API
Now, you are ready to create the Cloud Function which calls the GraphQL API to GET the firewall events (for the past 60 min), parse the JSON response, and publish a message to the Pub/Sub queue. In the Google Cloud console, switch to the Cloud Functions service to create a new function.
The function should be created with the Node.js (e.g. Node.js 20) runtime environment. It is not resource intensive, so you can use the minimum resources (0.167 CPU and 256 MB RAM) when creating it. Also, the min number of instances should be 0 and the max number of instances should be 1. Because it is a batch process, it is not desired to have the function invoked more than once at a given time. You can specify a sensible amount of time (e.g. 30s) as the maximum execution time, even though normally the API call completes within seconds.
Here are the environment variables to specify when creating the function. The API key is the access token which you created in a prior step, and the Cloudflare email is the email address which owns that account.
- CLOUDFLARE_API_KEY – the Cloudflare access token
- CLOUDFLARE_EMAIL – the email address of the Cloudflare account
The beauty of using Cloud Functions is that you are only billed for the amount of time that the function’s container is running. This means that you can expect your function to cost a few dollars, or even mere pennies per month.
Here is example source code of the Node.js project for the function:
index.js
These values need to be modified in the code.
- topicName – the name of your Google Cloud Pub/Sub topic
- zoneTag – the Cloudflare zone ID corresponding to your domain
- ruleid – the Cloudflare rule ID of your custom firewall rule
const { PubSub } = require('@google-cloud/pubsub'); const axios = require('axios'); exports.cloudFunction = async (event, context) => { const pubSubClient = new PubSub(); const topicName = 'waf-events-cloudflare'; const now = new Date(Date.now()).toISOString().replace(/\.000Z$/, 'Z'); const ago = new Date(Date.now() - 60 * 60 * 1000).toISOString().replace(/\.000Z$/, 'Z'); const data = { query: ` query ListFirewallEvents($zoneTag: string, $filter: FirewallEventsAdaptiveFilter_InputObject) { viewer { zones(filter: { zoneTag: $zoneTag }) { firewallEventsAdaptive( filter: $filter limit: 10000 orderBy: [datetime_DESC] ) { action clientAsn clientCountryName clientIP clientRequestPath clientRequestQuery datetime ruleId source userAgent } } } } `, variables: { zoneTag: '1046f22XXXXXXXXXXXXXXXXXXXXXXXXXX', filter: { datetime_geq: ago, datetime_leq: now, ruleId: 'f92ed9fXXXXXXXXXXXXXXXXXXXXXXXXXX', }, }, }; const config = { method: 'post', url: 'https://api.cloudflare.com/client/v4/graphql', headers: { 'Content-Type': 'application/json', 'X-Auth-Email': process.env.CLOUDFLARE_EMAIL, Authorization: `Bearer ${process.env.CLOUDFLARE_API_KEY}`, }, data, }; try { const response = await axios(config); const mensagem = { data: response.data, }; // await pubSubClient.topic(topicName).publisher().publishMessage(mensagem); // Publish data to Pub/Sub topic const topic = pubSubClient.topic(topicName); const message = JSON.stringify(mensagem); console.log(`${message}`); const buffer = Buffer.from(message, 'utf8'); await topic.publish(buffer); // await topic.publish(message); context.send({ status: 200 }); // Add this line to send a 200 status code console.log(`Published message to topic ${topicName}`); } catch (error) { console.error(error); } };
package.json
{ "dependencies": { "@google-cloud/functions-framework": "^3.0.0", "graphql-request": "^4.2.3", "axios": "^0.21.1", "@google-cloud/pubsub": "^4.4.0" } }
Schedule a job in Cloud Scheduler to invoke the Cloud Function
From the IAM service of Google Cloud, create a service account for Cloud Scheduler (e.g. named “Cloud Scheduler”, and configure it with the “Cloud Functions Invoker, Cloud Functions Viewer, Cloud Run Invoker, Cloud Run Viewer, and Cloud Scheduler Admin” roles.
Then, switch to the Cloud Scheduler service in the Console and create an hourly job using the crontab style notation (0 */1 * * *).
- Execution target type: HTTP
- URL: https://us-west1-gcp-projectid-123456.cloudfunctions.net/cloud-function-name
- HTTP method: POST
- Auth header: Add OIDC token
- Service account: Cloud Scheduler
- Audience: identical as URL
The other options, such as retry and backoff can be left as default.
Click the “Create” button to save the job. If you configured everything correctly, you should be able to see the raw firewall events data populated into the firewall_events table of the waf_events_cloudflare BigQuery dataset hourly – using the following example SQL query:
SELECT adaptive.action, adaptive.clientAsn, adaptive.clientCountryName, adaptive.clientIP, adaptive.clientRequestPath, adaptive.clientRequestQuery, adaptive.datetime, adaptive.ruleId, adaptive.source, adaptive.userAgent FROM waf_events_cloudflare.firewall_events, UNNEST(data.data.viewer.zones) AS zones, UNNEST(zones.firewallEventsAdaptive) AS adaptive
Based on the default “on-demand pricing”, Google BigQuery offers the first 1 TB of data processed each month as part of its free tier, so you should see minimal charges from BigQuery as a result of generating reports from your Cloudflare firewall events alone. Charges apply for storing the data ($0.023/GiB month) – but the first 10 GiB/month is free.
Using BigQuery to warehouse your Cloudflare firewall event data is an excellent way to get into the powerful capabilities of the Google Cloud “Big Data” ecosystem while propelling your SecOps capabilities to a higher level of sophistication. The ability to retain a virtually unlimited duration of firewall events using an external data warehouse is a game changer in responding to emerging threats to your web applications.
Go further – automate your Cloudflare firewall rules with Cloud Functions
It is even possible to additionally use the Cloudflare Rulesets API to programmatically create new rules (automatically create IP access rules, or edit custom rules to block ASNs) based on logged firewall events. Our SecOps consultants have implemented custom solutions where incoming threats are automatically analyzed against an IP and ASN database to determine if it’s an ISP or a datacenter (e.g. hosting provider, VPN, proxy). Then, the Cloud Function makes the decision to block the IP address or entire ASN based on the classification of the network, and applies the change via the Cloudflare Rulesets API.
This automation of the Cloudflare WAF through using Cloud Functions and an IP/ASN database has drastically reduced the attack surface of our clients’ web applications while preserving access to legitimate, human users and friendly “bots” (e.g. search engine crawlers). If you are a Cloudflare customer, get in touch with a SecOps security reporting & automation expert who can assist with customizing custom reports in BigQuery for your firewall events, and automating your firewall configuration to stop threats – day and night – giving you the peace of mind to focus on running your business, even without a dedicated SOC or cybersecurity team.