Processing Content Security Policy Logs
> I set up a content security policy and then set it to just report. So I can check its not inadvertently breaking some part or function on the site.
When the browsers log the contraventions then send a json request to your end point you specify. I'm logging it to a file like DateTime, remote Ip, JSON, http_user_agent which makes it a bit more awkward to get the JSON in the report out but gives me enough information to be able to work out if I have a few repeat offenders or issues in a few browsers. The JSON isn't really ordered consistently so you can't easily scan through the file visually, sometimes the blocked-uri is at the start sometimes at the end for example. Also because they include the contravened directive in the report it can get to be a big block of text.
So I'm doing something like
Then I'm doing a bit of Python to process the json in each row to extract the information I need for a simple summary
Which output to a file gives me two columns blocked uri, violated-directive e.g: https://cdn.joinhoney.com, font-src https://natureofgames.com, frame-src
so you can gauge how big a problem is by the number of rows when its sorted.
Some of the results are a bit difficult to understand, you think er whats that and look through the site code and find the blocked-uri doesn't exist on the site at all. the honey example above I think is someone having a plugin in their browser, that is then adding code to the page, there are quite a few rows like that.
There are also more confusingly people with what seems to be malware in their browser redirecting links to other end points, I guess blocking those urls is doing the user of the browser a favour.
The more confusing part is the odd stuff that you can't work out so easily then your googling to find out what comes from some url like http://example.com/js/example.js If they have no viewable page there I'm assuming they are blockable and not worrying about them to much.
Other entries are a bit more difficult to understand because the report json doesn't give you much of a clue where the blocked url is data or a blob or an extension data, img-src blob, script-src chrome-extension, font-src
I know the site I'm looking at doesn't use data or blobs so I'm not that worried about blocking those, I'm just not sure at the moment if something is converting urls on the fly. When I look at the site in the browsers reporting the issues I don't see those violations in the browser console. Seems to be some debate about if CSP should affect extensions https://en.wikipedia.org/wiki/Content_Security_Policy#Browser_add-ons_and_extensions_exemption
I guess people will shout when things stop working.
Header always set Content-Security-Policy-Report-Only "default-src ....
When the browsers log the contraventions then send a json request to your end point you specify. I'm logging it to a file like DateTime, remote Ip, JSON, http_user_agent which makes it a bit more awkward to get the JSON in the report out but gives me enough information to be able to work out if I have a few repeat offenders or issues in a few browsers. The JSON isn't really ordered consistently so you can't easily scan through the file visually, sometimes the blocked-uri is at the start sometimes at the end for example. Also because they include the contravened directive in the report it can get to be a big block of text.
So I'm doing something like
awk -F, '{FS=", ";print $3}' csp.log > json-column.log
to get the json out into a separate file with just the JSON column on each row.Then I'm doing a bit of Python to process the json in each row to extract the information I need for a simple summary
import sys, json;
import os
fname="json-column.log"
with open(fname) as f:
for lineNumber, line in enumerate(f):
data = json.loads(line);
print data['csp-report']['blocked-uri'], ', ', data['csp-report']["violated-directive"];
Which output to a file gives me two columns blocked uri, violated-directive e.g: https://cdn.joinhoney.com, font-src https://natureofgames.com, frame-src
so you can gauge how big a problem is by the number of rows when its sorted.
Understanding the Log
Some of the results are a bit difficult to understand, you think er whats that and look through the site code and find the blocked-uri doesn't exist on the site at all. the honey example above I think is someone having a plugin in their browser, that is then adding code to the page, there are quite a few rows like that.
There are also more confusingly people with what seems to be malware in their browser redirecting links to other end points, I guess blocking those urls is doing the user of the browser a favour.
The more confusing part is the odd stuff that you can't work out so easily then your googling to find out what comes from some url like http://example.com/js/example.js If they have no viewable page there I'm assuming they are blockable and not worrying about them to much.
Other entries are a bit more difficult to understand because the report json doesn't give you much of a clue where the blocked url is data or a blob or an extension data, img-src blob, script-src chrome-extension, font-src
I know the site I'm looking at doesn't use data or blobs so I'm not that worried about blocking those, I'm just not sure at the moment if something is converting urls on the fly. When I look at the site in the browsers reporting the issues I don't see those violations in the browser console. Seems to be some debate about if CSP should affect extensions https://en.wikipedia.org/wiki/Content_Security_Policy#Browser_add-ons_and_extensions_exemption
I guess people will shout when things stop working.
/ Adam