Because the current search engine bots are incapable of reading some web site elements, some structural aspects need to be considered in order to achieve a high ranking in the SERPs, or even be indexed. Here is a list of attributes that may have a negative impact on a website’s positioning in the search results.
Search engines cannot read images. Having great keyword-rich text content will not help in the rankings, if the content is part of an image, like a gif- or jpeg-file. Usage of images as such does not have a negative influence on search positioning. That is as long as the file sizes are not too large. Nevertheless, if most of the text content is part of images, there is no “spider food” for the search engine bots to index.
Macromedia Flash is an excellent tool for creating attractive web presentations. Some search engines have the ability to index text and follow links in Flash-based elements (.swf) and some Flash-sites may even rank for some terms. However, Flash-sites present several problems for search engines. Search engines cannot read most Flash elements and therefore should not be used unless absolutely necessary. Furthermore, web sites that have been created solely with Flash do not use standard URL paths, meaning that a user cannot access a particular web page on-site, without going through the navigation of the site. Therefore, instead of typing the URL in the address bar, the user needs to enter the main page of the site and find her way from there. This is not only a problem from a user standpoint but also from the search engine point of view. Search engines index web pages by their URL and because Flash-sites do not have unique URLs for each page, most of the content will not be indexed.
Using Macromedia Flash can be used for making web sites more attractive and using it with navigation is fine, as long as the main text content is standard HTML and there is an HTML alternative navigation or a sitemap.
Frames are a feature of HTML that makes having more than one page open at the same time within the same screen possible. Frames have traditionally been used to divide the screen to site’s title, main content and navigation. There are some problems with usage of frames.
Google mentions these problems in its guidelines:
“Google supports frames to the extent that it can. Frames tend to cause problems with search engines, bookmarks, emailing links and so on, because frames don't fit the conceptual model of the web (every page corresponds to a single URL)“ (Google b).
In order to rank in the search engines and provide a pleasant user experience, it is crucial not to use frames.
Dynamic database-driven sites often have long dynamic URLs with several variables such as “http://www.example.com/shoes.php?formid=2&id=807&brand=nike&shoesize=11&ion=women” compared to a static URL like “http://www.example.com/shoes/nike/women/size11/”. Dynamic URLs like these are not recommended because they can cause some problems with indexing. Google’s guidelines address dynamic URLs specifically in their “Reasons your site may not be included” section.
“Your pages are dynamically generated. We are able to index dynamically generated pages. However, because our web crawler can easily overwhelm and crash sites serving dynamic content, we limit the amount of dynamic pages we index” (Google b).
Yahoo! acknowledges the problem in a similar fashion on its guidelines.
“Yahoo! does index dynamic pages, but for page discovery, our crawler mostly follows static links. We recommend you avoid using dynamically generated links except in directories that are not intended to be crawled/indexed (e.g., those should have a /robots.txt exclusion).” (Yahoo! b.)
Dynamic websites as such are not a problem for search engines, and thus rewriting the URLs into static ones will overcome the problem with dynamic URLs.
A related problem to dynamic URLs are session ID’s. A session ID is a unique number that is assigned to a Web site visitor and which is used to track the visitor's path and the time of entry and exit. As each visitor gets a unique identifier, search engine bots that visit the site get one as well, and therefore each URL appears different even though the pages remain the same. This could lead to continuous crawling, which is a problem that search engines try to avoid. Therefore, search engine bots may not crawl the pages at all.
Another problematic element of some websites is a session cookie. Cookies are files that are stored on the user's hard drive and record data about the user. Some websites require the user to have cookies enabled in order to be able to view the pages of the website. This is a problem for the search engines bots, which are not configured to accept cookies. As a result, search engine bots will not be able to crawl the web pages. (Thomason 2003.) Websites should not require cookies, because not only do they cause problems with the search engines, but also some Internet users have cookies disabled in the settings of their browsers. Consequently, those users can not view the web pages that require cookies.
The World Wide Consortium (W3C) develops web standards and guidelines for people who work with web technologies. Validation is a process of analyzing web documents in comparison to the web standards and guidelines. Following these standards and guidelines diminishes the number of potential bugs on web sites. For example, validating the HTML-code and cascading style sheets (CSS) prevents bugs that may cause problems with the indexing. For example, errors in HTML or CSS can cause spiders to skip some of the web page content or ignore it completely, thus preventing proper indexing of the web pages.