5 terrible SEO ideas

Having looked at many small businesses websites, I’ve compiled a list here of 5 things that many of them are doing wrong with regards to SEO. I’m not saying that SEO isn’t important, but some techniques just don’t work. So, here goes… Read More »

Posted in tips | Tagged , , | 17 Comments

5 ways to make using bash more productive

Screenshot of a sample Bash session, taken on ...

Image via Wikipedia

If you are using Linux or a Mac these days, then you likely have bash as your default shell. It generally comes with a few nice features (tab-completion, history etc.), but there are a few tips and tricks which will make it much nicer to use. Here’s a run down of my favourite 5.
Read More »

Posted in linux | Tagged , , , | 8 Comments

Writing dynamic XML sitemaps using PHP

Graphic representation of a minute fraction of...

Since Google introduced sitemaps in 2005, they have grown to be accepted by the 4 main search engines: Google, Live Search, Yahoo and Ask.

As the offical sitemaps page describes:

Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.

So, basically it’s an XML file that simply describes what pages you have, when they were modified and how important you think they are.

An example would be:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
                            http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
        <url>
                <loc>http://bradshawenterprises.com/blog</loc>
                <lastmod>2008-09-07T10:21:52+00:00</lastmod>
                <changefreq>weekly</changefreq>
                <priority>1.0</priority>
        </url>
</urlset>

In this example, all fields are used, but you can get away with just the loc information. lastmod is an ATOM type date, changefreq can be always, hourly, daily, weekly, monthly, yearly or never, and the priority goes from 0.0 to 1.0. This, and the rest of the protocol are described at the official webpage.

Publishing a sitemap lets the search engines examine deeper parts of your site that may not be linked to that well, as well as providing data on what’s new without them having to crawl the whole site.

Each search engine provides an interface to register your sitemap and check it’s status. The best of these in my experience is Google Webmaster Tools, though the others have something equivalent as well.

Dynamically generating a sitemap

This tutorial will go through reading urls from a database rather than from the file system. This is because the key point here is describing things that have changed or are new. In my sitemaps I manually type in the static pages, and then dynamically write in the rest for simplicity and speed. Why read through numerous directories if we know that things haven’t changed.

So, we start off with a connection to a database:

include("assets/dbconnect.php");
$blogs = mysql_query("SELECT * FROM blog_posts ORDER BY timestamp DESC");

Here, I’m just using a stock database connection script and then really simply querying for the blog posts. I’m ordering them by timestamp so it’s easy to check it’s working, as the newest post will be first.

Next, we sort out a content type header, and the xml prologue.

header ("Content-type: text/xml");
echo ("<?xml version=\"1.0\" encoding=\"utf-8\"?>\n");

The header just tells the user agent that this is some xml so that it knows how to process it. If your browser gets confused and wants to download it, just comment this out whilst testing. Most modern browsers won’t do this anyway.

I’m echoing out the prologue, as PHP gets confused by the symbols.

Next, we set up the XML file and it’s namespaces:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">

This describes the XML to the user agent so that it knows how to interpret the various fields in the file.

Now we get to the bulk of the file:

<? while($current_post = mysql_fetch_array($blogs)) { ?>
<url>
	<loc><?= $current_post[url]) ?></loc>
	<lastmod><?= gmdate(DATE_ATOM, $current_post[timestamp]) ?></lastmod>
</url>
<? } ?>

This just loops through my blog file and spits out the url and a nicely formatted timestamp. I’m using gmdate here because my server is in a different timezone.

Underneath this, I just hand type the remaining files:

<url>
	<loc>http://YOURDOMAIN/about.php</loc>
	<priority>0.5</priority>
</url>
<url>
	<loc>http://YOURDOMAIN/contact.php</loc>
	<priority>0.5</priority>
</url>

Right at the bottom, just place a

</urlset>

To signify the end of the file.

That’s all you need for your sitemap file. Place it in a file in the root of your domain and call it sitemap.php.

Let search engines know it exists

Either create a file in the root of your domain called robots.txt, or open the existing one. At the bottom just add a line that says:

Sitemap: http://YOURDOMAIN/sitemap.php

and save it. This lets search engines find the file.

This is all good so far, now we have a map that updates as the site updates without any real hassle. The next step is to make sure that search engines are told every time a new file is added. For this, you need to find the code where you are saving new posts in a database. I’m using curl here because it seems to be available everywhere.

Add this code as soon as you’ve checked that the entry has been saved properly.

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "THEURLYOUNEEDTOPING");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
curl_close($ch);

Each search engine mentioned above has an address you can use here, here’s a quick summary:

Google: http://www.google.com/webmasters/tools/ping?sitemap=
Yahoo: http://search.yahooapis.com/SiteExplorerService/V1/ping?sitemap=
Ask: http://submissions.ask.com/ping?sitemap=
Live Search : http://webmaster.live.com/ping.aspx?siteMap=

All you do is add the full URL to your site map at the end, and use it in the code above. This will ensure that whenever you post anything, all the search engines are notified immediately.

I’d loop through these to do them all in one go like this:

$sitemap = "http://YOURDOMAIN/sitemap.php";

$pingurls = array(
	"http://www.google.com/webmasters/tools/ping?sitemap=",
	"http://search.yahooapis.com/SiteExplorerService/V1/ping?sitemap=",
	"http://submissions.ask.com/ping?sitemap=",
	"http://webmaster.live.com/ping.aspx?siteMap="
);

foreach ($pingurl as $pingurls) {
	$ch = curl_init();
	curl_setopt($ch, CURLOPT_URL, $pingurl.$sitemap);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
	$output = curl_exec($ch);
	curl_close($ch);
}

An alternative is to use a site such as PingMyMap, they provide an URL for you to use that they then use to ping the same search engines. The benefit here is that if the addresses change then it will still work. Your call really!

Any more ideas on how to implement this? Let me know below!

Posted in PHP | 10 Comments

3 Ways Google Applications can enhance teaching

Having recently started working at a school where email, calendars, documents etc are hosted by Google Applications, I felt that it would be worth while incorporating these into my teaching and planning. This post describes the benefits I have found from this in the first few weeks, and will outline some plans that I have for the future. If you haven’t used Google Applications in a workplace before, then here’s a quick outline of what’s included in the free version.

  • Gmail (7+ GB of storage)
  • Google Calendar
  • Google Documents
  • Google Sites
  • Google Talk
  • A version of iGoogle called Start Page

Read More »

Posted in software | Tagged |

Benchmarking Chrome’s V8 Javascript engine

Having previously looked at the memory usage of Google’s Chrome, it’s time to analyse it’s much mentioned javascript engine, V8. Assumedly named after the engine with the same name, V8 compiles javascript to machine code, rather than bytecode to achieve greater performance. In this respect, it’s similar to Firefox 3.1’s TraceMonkey, a JIT javascript compiler again developed to achieve much greater speeds when running javascript.

Read More »

Posted in software | Tagged , , , , , | 4 Comments

Google Chrome Memory Usage

Google Chrome Memory Usage, originally uploaded by Rich Bradshaw.

Browsing to about:memory in Google Chrome gives you this interesting display showing memory usage for both itself as well as other browsers running.

According to the browser, this is the memory usage of the following browsers on blank pages. (Although Chrome is on the about:memory page, that’s a different process, as shown below, so that doesn’t mess things up.)

Read More »

Posted in software | Tagged , , , , , | 13 Comments

A Vision of Students Today


An interesting view into how students live today. I’m thinking that it depends what degree you study on how much work and classes you have to do - 3 hours of class + 2 hours work a day is much less than I did, and I only graduated last year!

Also, noone brought laptops to lectures - we didn’t use Facebook all through lectures, the material was much to complex to do that and still pass!

Posted in Uncategorized |

5 mistakes new web developers often make

A graphical despiction of a very simple html document

Image via Wikipedia

Having talked to some university students who had taken computer science/IT degrees, I was amazed by how little they seemed to know about making anything that’s secure or even remotely logical. The group I met with primarily had been taught PHP. Having looked at some sites they were designing I realised 5 things that they had no idea they had done incorrectly, here’s a run down:

Read More »

Posted in tips, websites | Tagged , | 27 Comments

Web 2.0 … The Machine is Us/ing Us

Web 2.0 … The Machine is Us/ing Us

Really interestingly made video, definitely worth watching!

Covers the history of the internet in a really creative way.

Posted in Uncategorized |

Powerset: Find Factz, Get a T-shirt

The new semantic search tool for wikipedia, Powerset, have just announced a competition challenging users to find interesting “Factz” using their search tool.

So far, I’ve found:

Powerset: What eats humans?

What eats humans?

We have the usual: zombies, monsters and sharks, but the list also includes Catholics, foxes, streets and pigs…

Powerset: What do chickens like?

What do chickens like?

Only three things, potatoes, ham and tandoori. That’s good to know…

Posted in websites | Tagged , , | 3 Comments