IPFS website cheat-sheet

This website is generated using Nikola and hosted on IPFS with web server on krvtz.net domain operating merely as a proxy to IPFS node. After some struggle, I decided to share a few hints on how to actually implement such setup in a way that will not result in cryptic error messages being displayed to your clients.

This article is intended as a hands-on tutorial. I will go from very basic but inefficient IPFS techniques to more sophisticated ones which are much more usable in production, but have steeper learning curve so please read it to the end before carving anything in stone for production.

The use case assumptions: I edit this page on my Pinebook Pro laptop as a Markdown file, then run Nikola to compile the static website into the output directory and I have go-ipfs command-line utility installed. I also run a couple of IPFS nodes on hosted servers.

Initially, I initialised IPFS data on my account:

$ ipfs init 
initializing IPFS node at /home/kravietz/.ipfs
generating 2048-bit RSA keypair...done
peer identity: QmWXcrZwXMLz9AiD5D22kuDrG4gX2zNwgcFpLra2sFoHoD
to get started, enter:

    ipfs cat /ipfs/QmQPeNsJPyVWPFDVHb77w8G42Fvo15z4bG2X8D2GhfbSXc/readme

This is done just once and creates the default keypair that determines your node CID address (QmWX…). Everything in IPFS is addressed using CIDs, most prominently the files hosted on it.

The next step is to run the IPFS daemon — and keep it running in a separate terminal:

$ ipfs daemon --routing=dhtclient --enable-namesys-pubsub
Initializing daemon...
go-ipfs version: 0.5.1-8431e2e87
Repo version: 9
System version: arm64/linux
Golang version: go1.14.3
Swarm listening on /ip4/127.0.0.1/tcp/4001
Swarm listening on /ip4/192.168.1.100/tcp/4001
Swarm listening on /ip6/201:4b79:5bac:765c:9859:4cff:969e:ab58/tcp/4001
Swarm listening on /ip6/2a02:390:79ef:0:47a0:5d02:c48:b1ef/tcp/4001
Swarm listening on /ip6/300:29bc:337f:77be:62e4:e02:85d:f479/tcp/4001
Swarm listening on /ip6/::1/tcp/4001
Swarm listening on /p2p-circuit
Swarm announcing /ip4/127.0.0.1/tcp/4001
Swarm announcing /ip4/192.168.1.100/tcp/4001
Swarm announcing /ip6/201:4b79:5bac:765c:9859:4cff:969e:ab58/tcp/4001
Swarm announcing /ip6/2a02:390:79ef:0:47a0:5d02:c48:b1ef/tcp/4001
Swarm announcing /ip6/300:29bc:337f:77be:62e4:e02:85d:f479/tcp/4001
Swarm announcing /ip6/::1/tcp/4001
API server listening on /ip4/127.0.0.1/tcp/5001
WebUI: http://127.0.0.1:5001/webui
Gateway (readonly) server listening on /ip4/127.0.0.1/tcp/8080
Daemon is ready

Note the flags, they are important:

  • The --enable-namesys-pubsub flag enables new and much faster IPFS resolution mechanism. You want that enabled always on both laptop and server nodes.
  • The --routing=dhtclient flag enables prevents your node from routing all world's IPFS traffic. You want that on your laptop only.

Once daemon is running I can publish the website to IPFS:

$ ipfs add -r output/
added Qmb4JF3eYygqKnv8GjtBdvJehASymwnrQEoF8H4GK8v6AK output/2020/index.html
added QmZNzEaoU3cTvWumndUwFZTSrps9FtP3RxsTLru7xbY8Tk output/assets/css/baguetteBox.min.css
added QmcgeG1oqzuPGvvG7S9C3ML7qGSAtLTNqaGE5N1WjPCoAw output/robots.txt
…
added QmRPSiHswBZghPjF4jJiqXiuVk9m4ovrAbU9B2nPKAKZSa output

Pay attention to CID in the last line …KZSa — it contains the directory address of your website that is most useful in the next steps. As long as your node is running, all world's IPFS nodes can retrieve your website's pages in two ways. Most popular is through the directory CID (…KZSa) and the file name, just as you would with a regular filesystem:

$ ipfs cat /ipfs/QmRPSiHswBZghPjF4jJiqXiuVk9m4ovrAbU9B2nPKAKZSa/robots.txt
Sitemap: https://krvtz.net/sitemapindex.xml

User-Agent: *
Host: krvtz.net

But you can just as well retrieve the robots.txt file through its own CID (…PCoAw), as you can see the contents are exactly the same:

$ ipfs cat /ipfs/QmcgeG1oqzuPGvvG7S9C3ML7qGSAtLTNqaGE5N1WjPCoAw
Sitemap: https://krvtz.net/sitemapindex.xml

User-Agent: *
Host: krvtz.net

And this almost immediately works literally worldwide (Pinata is just one of many IPFS gateways):

$ curl https://gateway.pinata.cloud/ipfs/QmRPSiHswBZghPjF4jJiqXiuVk9m4ovrAbU9B2nPKAKZSa/robots.txt
Sitemap: https://krvtz.net/sitemapindex.xml

User-Agent: *
Host: krvtz.net

Now, time to bridge IPFS to the HTTP web:

  • I run an IPFS node on the krvtz.net server and use Nginx reverse proxy to bridge the content from IPFS to the web
  • I configured DNSlink TXT record in DNS for krvtz.net domain to hint IPFS-enabled browsers that the content is also available on IPFS

The first part is simple server section in Nginx config (read NGINX Reverse Proxy for details):

upstream ipfs-gateway {
 server 127.0.0.1:8080;
 keepalive 8;
}

server {
    listen 443 ssl http2;
    listen [::]:443 ssl http2;
    server_name krvtz.net;

    location / {
      proxy_pass http://ipfs-gateway/ipfs/QmRPSiHswBZghPjF4jJiqXiuVk9m4ovrAbU9B2nPKAKZSa/;
    }
}

This basically tells Nginx: if a request for https://krvtz.net/robots.txt arrives, fetch it from http://127.0.0.1:8080/ipfs/QmRPSiHswBZghPjF4jJiqXiuVk9m4ovrAbU9B2nPKAKZSa/robots.txt which happens to be a gateway handled by the local IPFS node. Pay attention to the trailing slash in proxy_pass or the URL will be constructed incorrectly.

This is the first very basic but fully workable IPFS-to-web architecture. Your HTTP server will serve the static website to regular web browsers, and as a bonus IPFS-enabled browsers (IPFS Companion available for many browsers) will detect the CID and switch to native IPFS.

This architecture does work, it has just one disadvantage: updating your website is a pain in the back. When you change content and repeat ipfs add, you will get new CID, which you need to update in the Nginx configuration. For frequent updates on many websites this is not an option.

Permanent IPFS addresses — introducing IPNS

The CID we saw above is just a hash of a file — file changes, its hash changes too, you get a different CID and all the burden associated with updating it wherever its referenced. Fortunately, we can also address IPFS files using an abstraction called IPNS:

  • we upload files to IPFS and get a variable directory CID that changes each time (ipfs add)
  • we publish the directory CID to a IPNS address that is static (ipfs name publish)

A side note: when you change and recompile your website, only a few files will change really: new or modified posts, but all the old posts and static files (robots.txt, JavaScript, CSS etc) will remain unchanged. IPFS deals with these very efficiently thanks to IPFS deduplication. You can actually see it for yourself: make note of CID of some static file (e.g. robots.txt) when you do ipfs add once, and then second time. The CID of the file (assuming it was unchanged) will be each time the same. It's only the directory CID that changes if you've added or modified some files inside.

If we use the IPNS address (which looks similar except it has /ipns prefix), we will always get the latest directory CID and version of the website. IPNS address is always the same because it's attached to an unique key pair stored at your node. You already have one RSA key pair — it was generated by the ipfs init command and it's called self:

$ ipfs key list -l
QmWXcrZwXMLz9AiD5D22kuDrG4gX2zNwgcFpLra2sFoHoD       self

Since IPNS is associated with a key, you basically need one key per one website. Let's generate a new key called krvtz for my krvtz.net website (and I use ed25519 instead of RSA, because I can):

$ ipfs key gen --type=ed25519 krvtz
12D3KooWNDWvM3SSFetYZGnFxx6XiFWd9RYxvF1TqwQPcQZXewdQ

Now, at the end of the last ipfs add we got the directory CID of your website:

$ ipfs add -r output/
…
added QmRPSiHswBZghPjF4jJiqXiuVk9m4ovrAbU9B2nPKAKZSa output

Now we publish the …KZSa CID to the IPNS associated with the krvtz key generated above:

$ ipfs name publish --key=krvtz /ipfs/QmRPSiHswBZghPjF4jJiqXiuVk9m4ovrAbU9B2nPKAKZSa
Published to 12D3KooWPawvRcVQ3jM1Xq59JKr4BGwSDGkDKZk5JqH2H2tAqUKS: /ipfs/QmRPSiHswBZghPjF4jJiqXiuVk9m4ovrAbU9B2nPKAKZSa

That …AqUKS CID is IPNS address and has to be referenced using /ipns prefix but otherwise it works in the same way:

$ curl https://gateway.pinata.cloud/ipns/12D3KooWPawvRcVQ3jM1Xq59JKr4BGwSDGkDKZk5JqH2H2tAqUKS/robots.txt
Sitemap: https://krvtz.net/sitemapindex.xml

User-Agent: *
Host: krvtz.net

We can actually check what CID is behind an IPNS address:

$ ipfs name resolve /ipns/12D3KooWPawvRcVQ3jM1Xq59JKr4BGwSDGkDKZk5JqH2H2tAqUKS
/ipfs/QmRPSiHswBZghPjF4jJiqXiuVk9m4ovrAbU9B2nPKAKZSa

Now, you have certainly noticed it's much slower with IPNS: this is because IPNS resolution relies much more on searching across the peer-to-peer network and old IPFS software implemented this in a very inefficient way. New code is much faster (remember the pubsub option above?) but some old nodes are still prevalent, slowing the search down. Over time it will get much faster as the old nodes are replaced.

Back to our Nginx configuration, the only change we need to implement is to replace /ipfs with IPNS address in proxy_pass directive (remember about the trailing slash):

proxy_pass http://ipfs-gateway/ipns/12D3KooWPawvRcVQ3jM1Xq59JKr4BGwSDGkDKZk5JqH2H2tAqUKS/;

My HTTP website will now serve the latest content uploaded into IPFS and subsequently published into the static IPNS address. And this pretty much where I could stop this tutorial… should I not stumbled on yet another practical annoyance: IPFS and IPNS seed nodes.

24/7 IPFS pinning and IPNS lifetime

IPFS is a peer-to-peer, distributed filesystem but unlike some other similar projects (e.g. I2P) it's not permanent storage by default. Other IPFS nodes will happily cache your content but they are not obliged to store it forever and they will frequently clean it up if they need space. Uninterrupted availability of your files in the network depends on at least one node permanently seeding it 24/7. In IPFS the old BitTorrent content of seeding is called "pinning".

My usage scenario described in the beginning contradicts this requirement: I publish from a laptop, and laptop is not running IPFS 24/7. This is fortunately easily resolved: I run a 24/7 IPFS node on one of my servers, and just tell that node to pin IPFS content uploaded from laptop, thus storing a copy and serving it indefinitely. After I upload (ipfs add) from my laptop, I just SSH to one of the permanent nodes and tell it to pin the latest upload:

$ ipfs pin add /ipfs/QmRPSiHswBZghPjF4jJiqXiuVk9m4ovrAbU9B2nPKAKZSa
pinned QmRPSiHswBZghPjF4jJiqXiuVk9m4ovrAbU9B2nPKAKZSa recursively

Problem solved, this directory CID will be now available to the whole network on 24/7 basis. By the way, the laptop that ran ipfs add has it pinned already by default.

But what about IPNS? If you ipfs pin IPNS address, it will just pin the regular CID behind it. And even bigger problem you will quickly notice after you start using IPFS in real life is that if your IPNS publishing node is not available, the IPNS addresses will stop resolving after some time. This is because IPNS records have a lifetime and after it expires, they need to be resolved again — and the only node on the network that can resolve them is the original node that holds the private key associated with specific IPNS address.

This can be solved in two ways:

  • Create IPNS records with longer lifetime e.g. ipfs name publish --lifetime=72h for 3 days. If your IPNS node connects to the network at least once per 3 days, this will allow other nodes to use the cached IPNS and only attempt to resolve after your signer node is again on the network. This has the disadvantage that if you publish a new CID, other nodes will continue to use the old one for the same time, thus serving stale content.
  • Upload (ipfs add) from your laptop, but publish (ipfs name publish) from a 24/7 node. This option seems to offer the best balance between update frequency, availability and convenience — and you need that 24/7 node anyway.

I use the latter scenario and I do everything just the same as above, except for locations (server is the 24/7 node):

  1. (Just once) Generate key pair (ipfs key gen) on server
  2. Update website and upload (ipfs add) on laptop
  3. Pin the new CID (ipfs pin add) on server
  4. Publish the new CID (ipfs name publish) on server

This is the model I'm using right now and I haven't come up with anything more sophisticated than that. This process can be also quite easily automated as all static website generators allow plugins and scripts to be run on various stages.

DNS to IPFS: introducing DNSlink

There's one more useful trick for IPFS-enabled websites: the DNSlink record. It's a simple TXT DNS record set for the whole domain or just a hostname to point IPFS nodes to relevant IPFS CID straight away. DNSlink record can obvioulsy also use IPNS:

$ dig txt _dnslink.krvtz.net +short
"dnslink=/ipns/12D3KooWPawvRcVQ3jM1Xq59JKr4BGwSDGkDKZk5JqH2H2tAqUKS"

As you can see, this is the same IPNS address as used in the above examples. With that set, you can now use IPNS names with DNS name which is much easier to remember:

$ ipfs name resolve /ipns/krvtz.net
/ipfs/QmRPSiHswBZghPjF4jJiqXiuVk9m4ovrAbU9B2nPKAKZSa

Note that my DNSlink points to a IPNS address already, but ipfs name resolve returns the final low-level CID. This is because most ipfs commands work with --recursive flag enabled by default which results in an attempt to resolve the chain till the last CID. Let's check what the DNSlink for ipfs.io resolves to as it's a good example of chaining IPNS for the sake of ease of maintenance but it's completely transparent to the end-users (we will discuss the bafy… CID in the next section):

$ ipfs dns ipfs.io
/ipfs/bafybeib6x5a7uytkyq6nnfl42bmhcnjroxfzecfmjiurmztzyfliejr5um

And without recursive resolution:

$ ipfs dns --recursive=false ipfs.io
/ipns/website.ipfs.io

With DNSlink this is what my proxy_pass line looks like in Nginx configuration after the DNSlink configuration (as usual, remember about the trailing slash):

proxy_pass http://ipfs-gateway/ipns/krvtz.net/;

CIDv1

IPFS CIDs use a clever construct called multihash which is a cryptographic hash with a prefix describing the algorithm used and ASCII encoding. You can immediately spot CIDv0 hashes by the Qm… prefix, and new CIDv1 hashes by bafy… prefix (these characteristic sequences are how their respective hash prefixes encode into ASCII), and there's a convenient command to decode these:

$ ipfs cid format -f=%P QmcgeG1oqzuPGvvG7S9C3ML7qGSAtLTNqaGE5N1WjPCoAw
cidv0-protobuf-sha2-256-32
$ ipfs cid format -f=%P bafybeigvek4xqsungqp5ggb4l22iqth3gw7npuvw437wmdaki2vy6qzyka
cidv1-protobuf-sha2-256-32

You can also convert between CIDv0 and CIDv1:

$ ipfs cid base32 QmcgeG1oqzuPGvvG7S9C3ML7qGSAtLTNqaGE5N1WjPCoAw
bafybeigvek4xqsungqp5ggb4l22iqth3gw7npuvw437wmdaki2vy6qzyka

Otherwise, CIDv1 works just as before (yes, this is the same robots.txt file used as example before):

$ ipfs cat /ipfs/bafybeigvek4xqsungqp5ggb4l22iqth3gw7npuvw437wmdaki2vy6qzyka
Sitemap: https://krvtz.net/sitemapindex.xml

User-Agent: *
Host: krvtz.net

Why bother? Mostly because IPFS gateways running on your local node will automatically convert DNSlink-enabled websites into IPFS-native pseudo-hostnames containing the CID, such as http://bafybeigvek4xqsungqp5ggb4l22iqth3gw7npuvw437wmdaki2vy6qzyka.ipfs.localhost:8080/. But DNS hostnames are not case sensitive so CIDv1 was designed to be in lowercase only, at the cost of being slightly longer.

Can we use CIDv1 from the very beginning? Sure, just add --cid-version=1 flag (the other extra options are optimizations) and all CIDs will be version 1:

$ ipfs add --cid-version=1 --inline --fscache -r output/
…
added bafybeifz3kuxyrjdry2nx7rkcyfaplx6726b42skwa4grd26axmt4xekvu output

The bafy… CID is then passed to IPNS and DNSlink in the same way as described before.

Thanks

This article wouldn't be possible without support and hints from the #ipfs:matrix.org community on Matrix.

I also had almost finished article on doing a very similar setup using DAT Protocol which is functionally similar concept to IPFS but then the Beaker Browser team has released a complete rewrite of the protocol so the publication won't make sense until I update it with the latest features.

I'm on Mastodon and Twitter, feel free to comment!