security.txt usage :: August 2024 census

security.txt

Have you heard about security.txt?

It’s a proposed standard where websites can point to their security policy, list security contacts, publish their GPG key, and other important information useful for anyone (but mostly security researchers), in a text file placed at a known location (or should I say a “well-known” location 🤪).

You can see ours here: https://www.deltablot.com/.well-known/security.txt

It is similar to robots.txt, humans.txt or ads.txt.

Worldwide usage

TL;DR: 20%

Following a discussion on the HAProxy mailing list, I decided to look at the actual availability of this file on many websites. To do that, I first needed a list of websites, which was easy to find, thanks to Majestic Million which conveniently propose to download their list in a .CSV file, without being pesty about registering an account or anything. Cheers to them!

The list, as the name implies, contains 1 million websites, roughly ordered by popularity. The following numbers are compiled from a sample of the first 5,000 websites.

Methodology

This python code will do the job of collecting the content of the file, only if it’s an actual text file, as some websites will either redirect to a login page (instagram.com), or serve some 404 like page with a 200 status code.

#!/usr/bin/env python
import pandas as pd
import requests

CONNECT_TIMEOUT = 3
READ_TIMEOUT = 5
status_codes = []
contents = []

# load list of websites. This file can be obtained from:
# https://downloads.majestic.com/majestic_million.csv
df = pd.read_csv('majestic_million.csv')
# let's cut it down a bit, 1 million websites would take days to process
df = df.head(5000)
num_rows = df.shape[0]

for index, row in df.iterrows():
    url = f"https://{row['Domain']}/.well-known/security.txt"
    print(f'[{index}/{num_rows}] Processing: {url}')
    try:
       response = requests.get(url, timeout=(CONNECT_TIMEOUT, READ_TIMEOUT))
       status_codes.append(response.status_code)
       # some websites will reply with code 200 but an HTML page, we don't want to store that, so filter on the Content-Type header
       if response.status_code == 200 and response.headers.get('Content-Type', 'empty') == 'text/plain':
           contents.append(response.text)
       else:
           contents.append(None)
    except requests.RequestException as e:
       status_codes.append(None)
       contents.append(None)

# store results in a dataframe
result_df = pd.DataFrame({
    'Domain': df['Domain'],
    'Status Code': status_codes,
    'Content': contents
})

# Now save to csv and we're done!
result_df.to_csv('result_df.csv', index=False)

Good enough.

Running it on 5,000 URLs took about an hour and 15 minutes.

Results

Status code distribution

Here is a pie chart of the status code distribution:

status code distribution of security.txt

First information: about 20% of websites serve this file. Not great, not terrible.

TLD distribution

Let’s look at the TLD (domain name extension) distribution, to see where it is most popular:

TLD distribution of security.txt

Okay, let’s remove the .com to see more clearly:

TLD distribution of security.txt

The one thing we can get from this (and from another analysis not shown here), is that the .de domain names have significantly more security.txt files than the rest of the TLD.

Miscellaneous observations

The cyber security agency ANSSI (cyber.gouv.fr) serves an error page with status code 667 (!!). Which is funny because they are actively scanning and collecting this information, and promote its use on their website.

There are some cool ASCII art:

  • University of Waterloo (Canada)
#                         ---------------------
#                         < uwaterloo dot ca  >
#                         < security dot txt  >
#                         ---------------------
#                                   ___        \
#                               ,-""   `.       |
#                             ,'  _   e )`-._  /
#                            /  ,' `-._<.===-'
#                           /  /
#                          /  ;
#              _          /   ;
# (`._    _.-"" ""--..__,'    |
# <_  `-""                     \
#  <`-                          :
#   (__   <__.                  ;
#     `-.   '-.__.      _.'    /
#        \      `-.__,-'    _,'
#         `._    ,    /__,-'
#            ""._\__,'< <____
#                 | |  `----.`.
#                 | |        \ `.
#                 ; |___      \-``
#                 \   --<
#                  `.`.<
# hjw                `-'
# https://ascii.co.uk/art/goose
  • Dreamhost.com
#         .;''-.
#      .' |    `._
#     /`  ;       `'.
#   .'     \         \
#  ,'\|    `|         |
#  | -'_     \ `'.__,J
# ;'   `.     `'.__.'
# |      `"-.___ ,'
# '-,           /
# |.-`-.______-|
# }      __.--'L
# ;   _,-  _.-"`\         ___
# `7-;"   '  _,,--._  ,-'`__ `.
#  |/      ,'-     .7'.-"--.7 |        _.-'
#  ;     ,'      .' .'  .-. \/       .'
#   ;   /       / .'.-     ` |__   .'
#    \ |      .' /  |    \_)-   `'/   _.-'``
#     _,.--../ .'     \_) '`_      \'`
#   '`f-'``'.`\;;'    ''`  '-`      |
#      \`.__. ;;;,   )              /
#       `-._,|;;;,, /\            ,'
#        / /<_;;;;'   `-._    _,-'
#       | '- /;;;;;,      `t'` \.  You've poked and cajoled
#       `'-'`_.|,';;;,      '._/|  Found security gold!
#       ,_.-'  \ |;;;;;    `-._/   We thank you most confidently
#             / `;\ |;;;,  `"      For disclosing responsibly
#           .'     `'`\;;, /
#          '           ;;;'|
#              .--.    ;.:`\    _.--,
#             |    `'./;' _ '_.'     |
#              \_     `"7f `)       /
#              |`   _.-'`t-'`"-.,__.'
#              `'-'`/;;  | |   \ mx
#                  ;;;  ,' |    `
#                      /   '
#
  • Grafana.com doesn’t follow the standard format, which is weird because if you’re going to serve it at the standardized location, why not use the standard formatting?

Conclusion

Well, I’ll stop here because what I wanted was this 20% figure, but if you wish to go further and calculate the distribution of GPG signed messages, or Expiration dates or something else, just grab the code and start crawling!

Willy Tarreau, author of HAProxy, suggested looking at the distribution of open source projects vs commercial sites. The hypothesis being that open source projects don’t really care about this because they already have a clearly defined policy for reporting security issues, and don’t need this, whereas commercial entities will find this more useful. I don’t have a way to discriminate the list based on “opensourceness”, so we can’t conclude. But OpenBSD, FreeBSD, Ubuntu or Archlinux don’t have it. Whereas Apache or Redhat have it. The Redhat one is pretty complete, too. It is also present in the GAFAM websites and many “big” american websites.

In any case, I find it a useful endpoint, and if you wish to add it to your website, you can add these two lines to your HAProxy config:

acl securitytxt-acl path_beg /.well-known/security.txt
http-request return status 200 content-type text/plain string "Contact: mailto:security@company.com\n[RestOfTheFile]\n"; if securitytxt-acl