Category Archives: internet

2016 in review, from Swarm’s point of view

So I got an email from Swarm the other day – they’re a geo-locative social media platform. It’s like when you “check-in” on Facebook, only a whole separate app. It’s fun, and silly, and I don’t have too many connections on it since I like to keep that type of stuff private.

Anyway, this was in the email:

So…. they seem to think that the places I check in most might be places I “love” to go…. Ew.

Future proof websites?

Warning: what follows, besides discussing 9/11, is also kind of a nerdy/geeky/technical discussion about how web pages link to each other and an idea for how to make the links between pages, especially pages that may disappear some day, work better. Maybe.

Today is Patriot Day, a “national day of service and remembrance”. Because it’s also the 14th anniversary of 9/11, I ended up reviewing my collection of 9/11 “stuff” – something I started on 9/11 and continued collecting for a few days after those events. It helped to process things a little bit, I think.

Recently, Dave Winer has been discussing, among other things, the “future-safety” of the internet:

The concern is that the record we’re creating is fragile and ephemeral, so that to historians of the future, the period of innovation where we moved our intellectual presence from physical to electronic media will be a blank spot, with almost none of it persisting.

While reviewing my collection, I realized a possible reason his piece has been percolating in the back of my mind was, in fact, this same collection. Why? Take a look – the images I’m hosting myself, since it’s not a big deal (bandwidth wise or effort wise). The links to other sites? That’s where it falls apart.

Some of the sites are still there – and one or two of the links I had still work. That’s awesome – someone thought ahead, or took the time, when they re-did the website, to make sure that the old content was still accessible.

Other links go to sites that still work, but the “layout” of the website – their URL’s and or URI’s (Uniform Resource Identifier’s) – have changed and no-one took the time to make sure it was still accessible easily. For a couple of those, I was able to find the article on the site at it’s new address, so I updated that.

Then there are two other cases left to deal with: the website is gone, or the link that I have uses a URL click-tracker service that is no more. In the case of the website being gone, I can try to use the Internet Archive (or “Wayback Machine”) to try to find the article and then figure out what to do – I could link to the archive’s version, but I decided to take that snapshot and copy it to my own server – I can’t necessarily rely on the archive to be there forever, can I? Maybe, maybe not.

In the case of the URL tracker, well, that’s going to mean some work. I can try to see if the article is available by title, but my search just now for “World reacts to calamity” returned lots of results, but none of them seem to be on the C|Net website – which is apparently either where I got the link in the first place, or where the article was hosted. That’s not helpful at all.

So what can be done? For starters, encourage the discussion. I went to Winer’s site and posted a comment:

I’ve been mulling this all over, and then realized why today. On 9/11/01, I was collecting links of things relevant to what was going on, but I only had links to the pages. I went back to my collection today, and a lot of the stuff is gone – possibly forever? I went and used the internet archive where I could for some things just now, but a lot of the content seems to be lost – especially due to click-tracking links used at the time. If only I knew then what I know now…..

Dave was quick to reply:

Yes. Today is a very good day to be thinking about that. I should write a blog post. Thanks for pointing this out.

And then he wrote a quick little piece about it: A good day to think about web history. And he has the EXACT same problem: links from that day on his own site just aren’t working.

I’ve tried to sound the alarms. Every day we lose more of the history of the web. Every day is an opportunity to act to make sure we don’t lose more of it. And we should be putting systems into place to be more sure we don’t lose future history.

There’s a solution in there somewhere, that’s for sure. For one thing, you have google, which indexes every page ever if it’s allowed to. But that’s only part of the equation – finding the data. But how? What are we going to look for? And, more importantly, where are we going to look? If a server goes offline, that data is gone unless it’s in the archive (which isn’t fee) or someone decides to mirror it (also not free). But how to make it easy to find? Some content, when you’re searching by title for example, you might find multiple sites similarly titled articles – then you have to sort the wheat from the chaff.

Is there a better way? Maybe. Off the top of my head, we need to do a little more on the backend. But what?

Mark the pages somehow with a UUID (Universally Unique Identifier). For example, it could be an SHA1 hash of data from the page – maybe the hostname as the first part, then the time, date, and article title:
Future proof websites?

That gets turned into: d820eab50a74ad6c0c08566b210454848a573dcf-29b6082b508b593c8de53988ef3d2b14b327664b. What do we do with that? Ideally, it’s auto generated and then put into the META data of this web page. Then, when you link to my page, the browser pulls that out of the META data (if it’s available) and adds that to the link – so instead of:
<a href="">Future Proof Websites?</a>

you get:
<a href="" webprint="d820eab50a74ad6c0c08566b210454848a573dcf-29b6082b508b593c8de53988ef3d2b14b327664b">Future Proof Websites?</a>

If you copy/paste the link for an email, or to put on Facebook/Twitter/your blog/whatever, it copies that “webprint” into the link – and if the content goes down for some reason – maybe I die and my website goes away – then a search for the webprint would make it easier to find cached/mirrored copies of the data, since the ID would theoretically go along in the cache/mirror as part of the META data in the pages.

Clearly we would need to use something better/longer than what I have here, since it’s only 81 characters long. That seems like a lot, but we’re in the process of running out of IPv4 addresses and moving everything over to IPv6 – and we didn’t think we’d run out of IPv4 addresses for quite some time back when I got into the computer game.

By having the hostname be the first part of the hash, we reduce the odds of a clash – you could, theoretically have the same second hash as another site, but what are the odds that they would have the same first hash? Impossible unless the site stole your name somehow.

All of this is moot, however, without some longevity built into the hosting. One of Winer’s bigger concern is that sites like Facebook/Twitter/etc seem to have different rules about what counts as a “post” – Tweets don’t have titles, nor do Facebook status updates. But could they? Should they? Things like this mean it’s not as easy to just move your data from one hosting solution to another. You can pull your content from Twitter, but you can’t exactly upload it to Facebook and have it work. You can pull your data from Facebook, but there seems to be so much info available to you – like what advertisements you’ve clicked – that I think you might suffer from over load trying to figure out what to move.

I agree that there should be a standard for this data – and you, as an author/content provider/social media user – should be able to take the data from one service to another. And it should be easy – like just download from one service, suspend your account there, then upload to another and keep going, deleting your prior account when you feel comfortable. But that’s not how things are set up right now. Silos, it would seem are another part of the problem. But there’s a way around that – host it yourself. But then we get to the rub there: what if you die? What if the web server dies? How do you perpetuate your online self after pass on?


GPS tagged photos: should you be panicking?

fb-link-to-kyeosI recently saw a post on several friends FaceBook pages, all going to the same website, with the same headline: “WARNING!!!! If you take photos with your cell phone“. Clicking the link brings you to a website with a warning about the dangers of posting photos from your cell phone on social media sites. The article has since been removed, but it’s available in Google’s cache if you really want to read it. It’s more of a dire warning to watch a video from the NBC affiliate KHSB in Kansas City, MO and spread the word.

My issue is that, while it’s true that the photos on your phone do include the data they are mentioning, it’s easy for ANYONE to find that information (it’s not limited to “hackers”), and most social websites (Facebook and Twitter at least) remove that data when the photos are shared.

Here, for example, is a photo I took today when getting off the highway. There’s three copies: the one I emailed myself, the one I posted to FaceBook and the one I posted to Twitter.
photo-exif photo-facebook photo-twitter

If you take a minute to save the first one to your hard drive and open it with a program that can read the Exif (Exchangeable Image File Format) data, you can find my location (latitude and longitude) when I took the photo. On the Mac, just open the images in Preview, which comes with all Macs – if you have a PC, you can do a Google search to find something to read it. In Preview, do Command + I (or go to the “Tools” menu and choose “Show Inspector”) and the info window will appear. Select the second tab, then the GPS tab in the second row and you’ll get something like this:

I then copied and pasted the Latitude and Longitude:
Latitude: 41° 15′ 18″ N
Longitude: 73° 0′ 0″ W

You could trim that down and copy/paste it into Google Maps: 41° 15′ 18″ N,73° 0′ 0″ W. You’ll end up with a pin approximately where you were when you took the photo. I was getting off the highway.

What about Facebook or Twitter? Here’s the info that the same image, uploaded to their servers, presents when saved locally:

In other words, when the photos were uploaded, they removed the GPS data.

Is this a perfect system? Not really. If you uploaded images before they started to remove the GPS data, it’s possible that the data is still there. For example, when I first heard about this, in February 2012, I did some searches to see what I could find.

I ended up writing an email to a specific “friend” I met on Twitter, and who happened to have some photos that had GPS data embedded:

First of all, I’m _not_ a stalker, simply a fan.

I was sent an email today, ostensibly for parents, but really, just for anyone who should be thinking before posting things online: YouTube link.
(the gist: if you take photos on your smart phone, when you post them online, the geo-tagged info might still be in your photos).

I first went to Facebook to see if some photos that had recently been posted by people I knew had geotag info – nothing. I think that Facebook actually strips out the EXIF data, which I guess is good.

Next I went to twitter and started looking for photos posted there by people I follow, and struck out again.

I expanded my search again, and found, after looking at a photo you Instagrammed, that one of the photos you posted on your twit pic contained the data [link provided but removed].

To get the data, I saved the file to my desktop, opened it in Preview on the Mac, then brought up the Inspector (under the tools menu).

It shows:
Latitude: 40° xx’ xx” N
Longitude: 73° xx’ xx” W

There’s also a handy “Locate” button, which opens a browser to: here

Which includes a street address:
621-699 W 40th St
Manhattan, NY 10018

Again, I’m _NOT_ stalking you. I just need to be clear on this. OK? You just happened to be the first person that I found who was sharing photos that had the geotag data in it.

Anyway, I went to one of the websites mentioned in the YouTube video and it turns out that it’s well known that TwitPic doesn’t scrub the data:

And instructions on how to disable it, so that even if the website doesn’t scrub the data, it won’t be there:

Anyway, just thought you should be aware – I’ll be sending out some similar letters to friends and family.

(again, really not a stalker)

I ended up never sending it – it felt too stalkerly – but the video I linked to then is the same video!

There are ways to turn the GPS data embedding off if you want, just do a web search for “(your phone type) and disable gps tagging”.

I still check my uploaded photos every month or so, just to make sure – better safe than sorry!

Will you be able to use your computer on Monday?

But tens of thousands of Americans may still lose their Internet service Monday unless they do a quick check of their computers for malware that could have taken over their machines more than a year ago.

via Malware may knock thousands off Internet on Monday – Yahoo! News.

Ok, so while technically accurate, the title of that piece is a little misleading. While the computers that are potentially affected are, in fact, infected with malware, it’s really the FBI that might knock those computers offline – but there’s a reason for it.

Various hacker groups, when they infect machines on the internet, set them up to be able to do things for them. These machines, or “bots”, are turned into a group of computers, or “botnet”, by their controllers. Sometimes they’re used to take a website down (via a “Denial of Service” or “DoS” attack), sometimes to send out all that spam you get. It’s up to the person(s) controlling the botnet.

In this case, the compromised computers had had their DNS servers changed to ones controlled by the hackers. For those that don’t know, the DNS (domain name system) is what turns into an IP address that the computer can connect to – it’s like the white pages of the internet. By controlling the DNS servers that the compromised computers went to, they could change what websites would come up when a user went to any website, or even modify the pages that the users were going to. If you thought you were going to your banks website, they could make you go to their version of the website and capture your login information. Scary stuff.

If you think you might be infected – or just want more info – go to DNS Changer Working Group or FBI site about the malware to read more.

Google Android…. Toy?

Google Android Phone Mascot 3-Inch Do it Yourself FigureSo, I saw this and couldn’t quite explain it:
Order Google Android Phone Mascot 3-Inch Do it Yourself Figure
It’s pretty clear what you’re getting – an Android figure and some markers so you can do your own custom design on it, but…. Why?

I guess if you’re the creative type, it would be pretty amusing. And at $6.99, it’s not that big a deal if you do it “wrong” and have to start over, right?