Future proof websites?

Warning: what follows, besides discussing 9/11, is also kind of a nerdy/geeky/technical discussion about how web pages link to each other and an idea for how to make the links between pages, especially pages that may disappear some day, work better. Maybe.

Today is Patriot Day, a “national day of service and remembrance”. Because it’s also the 14th anniversary of 9/11, I ended up reviewing my collection of 9/11 “stuff” – something I started on 9/11 and continued collecting for a few days after those events. It helped to process things a little bit, I think.

Recently, Dave Winer has been discussing, among other things, the “future-safety” of the internet:

The concern is that the record we’re creating is fragile and ephemeral, so that to historians of the future, the period of innovation where we moved our intellectual presence from physical to electronic media will be a blank spot, with almost none of it persisting.

While reviewing my collection, I realized a possible reason his piece has been percolating in the back of my mind was, in fact, this same collection. Why? Take a look – the images I’m hosting myself, since it’s not a big deal (bandwidth wise or effort wise). The links to other sites? That’s where it falls apart.

Some of the sites are still there – and one or two of the links I had still work. That’s awesome – someone thought ahead, or took the time, when they re-did the website, to make sure that the old content was still accessible.

Other links go to sites that still work, but the “layout” of the website – their URL’s and or URI’s (Uniform Resource Identifier’s) – have changed and no-one took the time to make sure it was still accessible easily. For a couple of those, I was able to find the article on the site at it’s new address, so I updated that.

Then there are two other cases left to deal with: the website is gone, or the link that I have uses a URL click-tracker service that is no more. In the case of the website being gone, I can try to use the Internet Archive (or “Wayback Machine”) to try to find the article and then figure out what to do – I could link to the archive’s version, but I decided to take that snapshot and copy it to my own server – I can’t necessarily rely on the archive to be there forever, can I? Maybe, maybe not.

In the case of the URL tracker, well, that’s going to mean some work. I can try to see if the article is available by title, but my search just now for “World reacts to calamity” returned lots of results, but none of them seem to be on the C|Net website – which is apparently either where I got the link in the first place, or where the article was hosted. That’s not helpful at all.

So what can be done? For starters, encourage the discussion. I went to Winer’s site and posted a comment:

I’ve been mulling this all over, and then realized why today. On 9/11/01, I was collecting links of things relevant to what was going on, but I only had links to the pages. I went back to my collection today, and a lot of the stuff is gone – possibly forever? I went and used the internet archive where I could for some things just now, but a lot of the content seems to be lost – especially due to click-tracking links used at the time. If only I knew then what I know now…..

Dave was quick to reply:

Yes. Today is a very good day to be thinking about that. I should write a blog post. Thanks for pointing this out.

And then he wrote a quick little piece about it: A good day to think about web history. And he has the EXACT same problem: links from that day on his own site just aren’t working.

I’ve tried to sound the alarms. Every day we lose more of the history of the web. Every day is an opportunity to act to make sure we don’t lose more of it. And we should be putting systems into place to be more sure we don’t lose future history.

There’s a solution in there somewhere, that’s for sure. For one thing, you have google, which indexes every page ever if it’s allowed to. But that’s only part of the equation – finding the data. But how? What are we going to look for? And, more importantly, where are we going to look? If a server goes offline, that data is gone unless it’s in the archive (which isn’t fee) or someone decides to mirror it (also not free). But how to make it easy to find? Some content, when you’re searching by title for example, you might find multiple sites similarly titled articles – then you have to sort the wheat from the chaff.

Is there a better way? Maybe. Off the top of my head, we need to do a little more on the backend. But what?

Mark the pages somehow with a UUID (Universally Unique Identifier). For example, it could be an SHA1 hash of data from the page – maybe the hostname as the first part, then the time, date, and article title:

Future proof websites?

That gets turned into: d820eab50a74ad6c0c08566b210454848a573dcf-29b6082b508b593c8de53988ef3d2b14b327664b. What do we do with that? Ideally, it’s auto generated and then put into the META data of this web page. Then, when you link to my page, the browser pulls that out of the META data (if it’s available) and adds that to the link – so instead of:
<a href="http://agerstein.net/2015/09/11/future-proof-websites/">Future Proof Websites?</a>

you get:
<a href="http://agerstein.net/2015/09/11/future-proof-websites/" webprint="d820eab50a74ad6c0c08566b210454848a573dcf-29b6082b508b593c8de53988ef3d2b14b327664b">Future Proof Websites?</a>

If you copy/paste the link for an email, or to put on Facebook/Twitter/your blog/whatever, it copies that “webprint” into the link – and if the content goes down for some reason – maybe I die and my website goes away – then a search for the webprint would make it easier to find cached/mirrored copies of the data, since the ID would theoretically go along in the cache/mirror as part of the META data in the pages.

Clearly we would need to use something better/longer than what I have here, since it’s only 81 characters long. That seems like a lot, but we’re in the process of running out of IPv4 addresses and moving everything over to IPv6 – and we didn’t think we’d run out of IPv4 addresses for quite some time back when I got into the computer game.

By having the hostname be the first part of the hash, we reduce the odds of a clash – you could, theoretically have the same second hash as another site, but what are the odds that they would have the same first hash? Impossible unless the site stole your name somehow.

All of this is moot, however, without some longevity built into the hosting. One of Winer’s bigger concern is that sites like Facebook/Twitter/etc seem to have different rules about what counts as a “post” – Tweets don’t have titles, nor do Facebook status updates. But could they? Should they? Things like this mean it’s not as easy to just move your data from one hosting solution to another. You can pull your content from Twitter, but you can’t exactly upload it to Facebook and have it work. You can pull your data from Facebook, but there seems to be so much info available to you – like what advertisements you’ve clicked – that I think you might suffer from over load trying to figure out what to move.

I agree that there should be a standard for this data – and you, as an author/content provider/social media user – should be able to take the data from one service to another. And it should be easy – like just download from one service, suspend your account there, then upload to another and keep going, deleting your prior account when you feel comfortable. But that’s not how things are set up right now. Silos, it would seem are another part of the problem. But there’s a way around that – host it yourself. But then we get to the rub there: what if you die? What if the web server dies? How do you perpetuate your online self after pass on?


A week at camp, day 5

Breakfast (pancakes and sausages) and then some free time for leaders – we played KanJam, which is a frisbee based game. Then we had our last leaders meeting – so sad. 

Lunch was pizza, then no siesta – right to afternoon merit badges because at 2:30 was…..


Each troop participates in about 10 games on the Athletics field then 10 more at the waterfront. We teamed up with Troop 12, also out of Milford, since we only had about 15 or so kids each.

Team InterTrail (we are at International Campsite, they are at Trail) did well at the land games, placing in most of the events. Noah participated in the frisbee relay, helping to nab our second place seat. 

Sea games were a little more interesting for me – I signed up for the “Scoutmaster Splash,” which is basically a fancy way of saying belly flop. I went first, which apparently is not a good idea, since you maybe give other people ideas…. Anyway, I’m still a little red and I almost lost a contact during the leap. Oh, and I placed second. 

Noah helped with the “Sandwich Relay” – using the three levels of swimming areas, the scouts take parts to make a sandwich and them assemble them – and someone gets to “eat” the results. I did it last year and it’s kinda gross, but not the worst thing  in the world. 

After dinner – lemon chicken with rice and broccoli – we were disappointed by dessert: ice cream sandwiches. Normally dessert is individual cherry pies, and the mini pie plates get used as kazoos to drive most of the adults a little crazy… The staff seemed a little surprised as well, up until they started trying to use sandwich wrappers as kazoos….

an ice cream sandwich....

an ice cream sandwich….

After dinner we had some time before the closing campfire. Skits, all of them “dad jokes” (aka groaners), songs and a nice closing song by the staff. 

At our campsite, the boys made s’mores and prepared for our trip home. 

A week at camp, day 4

Breakfast (ham and cheese omelettes) and training happened; then the leader meeting and lunch (chicken nuggets and Mac & cheese). For the leader meeting I filled out the Scouter Merit Badge Challenge form – a way to keep the unruly adults from causing trouble.

After lunch/siesta, I followed Noah up to the shotgun range, where he finished qualifying for his merit badge. I was then invited to shoot at a few skeet.

taking aim...

taking aim…

Out of my six or so shots, I only hit one skeet, but it was my last one so I was pretty happy about it. Noah got video of the pigeon exploding, I’ll upload it later when I have wifi available. 

After shotgun, Noah went to Handicrafts to work on finishing a merit badge or two and I hoofed it out to the rifle range to do a little shooting for myself and turn in Noah’s targets for evaluation – I would have made him do it but he had to try to work on two projects and was pressed for time. I did pretty well shooting, and Noah didn’t quite make the cut. I explained that to him, and he dug out some more targets that he shot this week and got it signed off after dinner. 

After dinner was a mini lashing and map reading course for my ILOS course – while I was doing that, two of the COPE leaders were taking the Troop through some basic team building exercises.

At the same time as that, our troop Top Chef was preparing Apple Crisp for dessert. I was able to sample it after dinner and it did not disappoint: fully awesome.  

apple crisp

apple crisp