This chapter is part of the free e-book 'Basics of Search Engine Marketing', published in May 2005.

1.4.4 Indexing of content

Currently, the search engines are incapable of indexing most of the content on the Internet. For instance, Google’s database currently consists of just over 8 billion web pages. Considering that the World Wide Web consists of several different elements and several billions of web documents, only a fraction of the Internet is currently indexed by the search engines. The search engines, due to their insufficient search technology, ignore the contents of several types of files or index them improperly. Some search engines, including Google and Yahoo!, have non-text search applications, such as image or video search, but the search engines cannot crawl the actual content on those files, i.e. see the image or video. Instead, they will only take into consideration the text content of the file to determine whether the file is relevant to a search. This text content includes the filename, alternative text and text content of the web pages from where the file is linked.

Furthermore, search engine bots cannot read some of the text content on the web. This is due to various different web technologies that are not compatible with the search engines. Examples of problematic elements include Macromedia Flash and some structural elements of web pages like session IDs, dynamic content and content management systems, and JavaScript. Because these elements may prevent search engine bots from reading a web page, thus posing serious problems with search engine optimization, I will discussed them in detail in chapter 4.3.1.

Search engines will advance technologically, thus enabling the indexing of more content on the Web. Some features such as indexing the actual contents of video and image files will not happen in the near future, but many of the indexing problems that search engines have today will probably be solved in the next few years.

