Have you ever wondered how Google can guess what you’re searching in just a few letters? Were you ever surprised to find that the advertisement on the side of the screen always seemed to match whatever you happened to be thinking about? That’s no coincidence. That is a clever technique known as Latent Semantic Index in development at Google – now progressing in leaps and bounds.
In short, LSI is the ability to spot patterns in the words. People can remember dozens of ‘synonyms’, or words which mean basically the same thing. If in doubt, we can fish out a thesaurus or, yes, look it up on the internet. Even though the computer can tell us word alternatives, it doesn’t understand in the same sense, because it doesn’t speak human languages. It just processes data rather than ‘reading’ in the way a person does so easily. This makes matching text by content far more complicated.
The analysis comes from a math principle called Singular Value Decomposition, or SVD (simply put, a computer algorithm for working out statistics). You can read this wiki article on LSI for better understanding. After finding that its Adsense campaign was pretty successful, Google hired Applied Semantics to figure out how to improve the smooth running of their search engine. Before that, irrelevant pages would sometimes appear simply because they were stuffed with keywords.
Any webmaster familiar with Search Engine Optimization will know how it used to work – the text of a website would be filled to the brim with the same words over and over again, in order to trick search engines to think that the site had more relevant content than it really did. This ‘keyword stuffing’ technique bumped up the page ranking of these sites so that they appeared on the first page of search results. So, whatever you typed, you’d be likely to get a boring article, so full of key phrases it was practically unreadable.
On the other end of the scale, companies would insert irrelevant words into their text in order to ensnare people who were looking for something completely unrelated. Sneakily putting popular search words in to get hits on the site used to work (and still does on sites like Youtube) but with the semantic indexing project, all this does is water everything down. It’s a bit like if you plant seeds over a wide field, some of them will get neglected – if a web navigator scatters the keywords over too many distant topics, Google won’t recognize the site as a specialist in any topic.
Google’s ability to search for related content is a breakthrough. It ignores repetition and strikes offending sites from the rankings. Making use of this is a simple matter of using a whole bunch of related terms which people commonly search for – a technique called Latent Semantic Indexing. It sounds complicated, but it is actually a fairly basic part of SEO and something which we do naturally.
For the best success, you will need to find the most searched for terms. Using Google itself, any regular web author can use the tilde (~) in front of the search term to find result running along the same theme. To see which terms are likely to be most popular, head to a ‘high authority’ site – a website that is renowned for its accurate information on whatever subject of your site is about. Usually this is a big company, like Apple. Whichever terms they use, web architects should use.
Not only that, but Google can tell when two words that relate to different topics but spelled the same (called ‘polysemy’). It will not confuse inflation of the economy with inflation of a balloon. The spiders are able to crawl through the results and find out which words and phrases relate to each. Inflation in the sense of money is not likely to contain the words ‘hot air’. So if you search ‘inflation hot air’, you will not get a piece from the Wall Street Journal.
In that case, it’s better not to use any terms which might cause the poor Google spiders to get confused. If you wrote: ‘this panic about inflation is a load of hot air’, your article might be mistaken for a ballooning article and those who want to read about the economy won’t find your site with the usual search items.
Now that Google and other search engines analyze content more closely, the need for endless back links and anchor text has reduced. This should be a huge relief to any individual webmaster or a starting-out company with not many links to show for themselves – now new kids on the block get a piece of the pie, rather than having all the website visits stolen by big co-operations who throw money at IT specialists.
Before, topping the results included bumping up the page rank using links to other related sites (particularly ‘high authority ones) or internal links to other pages on that particular website. Keyworded-up, the links drew the spiders. They explored the site, checked how many reliable, functioning links there were and made assumptions about the content of the site based on how many other decent sites it referenced. All the line and colors made it a hideous strain on the eyes to read and rarely was the content any good – scraps of info laced with advertising opportunities.
These days, it’s actually better to reduce the number of links. Too many links going all over the place send your friendly internet spiders away from the most important places. Say you linked to an ICT firm and your main site from your blog. You probably want more focus on your main site, so that your main site can get an increased page rank, right? That won’t happen. Those links have equal value, so the spiders will branch off in all directions and get distracted.
The best way to set up a site that makes full use of LSI is to have numerous pages relating to a number of select topics within a theme, linked together in a straight forward way. If you make the search engine crawlers’ job easier, you will reap the benefits. Here is a basic picture of how the site should be mapped:
Every page contains only one link that directly leads to the next page. The last page leads to an external website. As you can see, it is divided into distinct sections. That way, the subjects covered aren’t competing with each other. In each topic, it’s best to pick a title and an angle that isn’t likely to have been used before. That way, you have no competitors for that article.
On a general subject, like nutrition, you might have three themes (minimum). Those loose themes are ‘landing pages’, an area for all the related topics to be pooled. The themes could be “Common Food Myths”, “The Uses of Supplements” and “The Truth about Exercise”. On these themes, there should be at least five separate pages linking into each other, each explaining an individual topic on the theme – e.g. on Food Myths, one article on meat, another on diets, etc. This layout will help both spiders and people navigate the site. A search of ‘meat myths’ will fetch up that one specific page that links to the rest of your site. This is better than putting all the good stuff on one page and risking looking irrelevant in the all-seeing eyes of Google.
Smart web maintenance will involve checking the present of keywords related specifically to the subject. Take this blog – even as you read it, it has been regulated for semantic search engines to increase its probability of being picked up by the counter. Each paragraph that you have head will have contained a handful of words relating to LSI without resorting to repetition – any writing on semantically related words will contain basic related words like ‘webpage’ and ‘Google; plus more specific terms like ‘synonym’ are vital tools for a webmaster to cover all the variables of SEO. Care has also been taken not to overlap with polysemic words or clash with other major sites. All it takes is some research and careful planning. High ranking blogs and articles go to the best prepared.
A search optimized piece of writing does not have to be over-technical or mind-numbingly dull. No longer do good search engines favour pages that endlessly repeat themselves. With LSA is a system that, like most new tech, improves the way we use the internet to allow the freedom to express what needs saying without worrying about falling in the ranks. It’s a new era for web content and every web writer is chomping at the bit to stay on top – understand Latent Semantic Indexing and you will achieve this perfectly. Also, application of LSI doesn’t completely overrule the value of backlinking although with the launch of ‘Google Caffeine’ it appears that LSI’s importance is a bit higher than backlinking. LSI is just one of the components that you should consider if you would like to optimize your site/portal for search engines.