Currently, the search engines are incapable of indexing most of the content on the Internet. For instance, Google’s database currently consists of just over 8 billion web pages. Considering that the World Wide Web consists of several different elements and several billions of web documents, only a fraction of the Internet is currently indexed by the search engines. The search engines, due to their insufficient search technology, ignore the contents of several types of files or index them improperly. Some search engines, including Google and Yahoo!, have non-text search applications, such as image or video search, but the search engines cannot crawl the actual content on those files, i.e. see the image or video. Instead, they will only take into consideration the text content of the file to determine whether the file is relevant to a search. This text content includes the filename, alternative text and text content of the web pages from where the file is linked.
Search engines will advance technologically, thus enabling the indexing of more content on the Web. Some features such as indexing the actual contents of video and image files will not happen in the near future, but many of the indexing problems that search engines have today will probably be solved in the next few years.