Search Engine - ISearchHere!

ISearchHereBOT 1.0 BETA

How to Identify ISearchHereBOT

Presumably, you arrived at this site because you noticed traffic from a User-Agent that identified itself with the string:


Mozilla/5.0 (compatible; ISearchHereBOT 1.0 BETA; +http://www.isearchhere.com/bot.php)
You have come to the right place to find out about the ISearchHereBOT crawler.




What does ISearchHereBOT do?

ISearchHere.Com crawls are used in an actual search engine available at http://www.ISearchHere.Com.com. This site gets 
queries from around the world.


How ISearchHereBOT often Crawls a Site

ISearchHereBOT is currently run sporadically (not continuously) on a large number of machines. Each machine has about 2-3 
fetcher processes. Each fetcher has open at most 100-300 connections at any given time. In a typical situation, these 
connections would not all be to the same host.


How you can Change how ISearchHereBOT Crawls your Site

ISearchHereBOT understands robots.txt (it has to be robots.txt not robot.txt ) files. A robots.txt must be placed in the 
root folder of your website for its instructions to be followed. ISearchHere.Com does not look in subfolders for 
robots.txt files. A simple robots.txt file to block ISearchHere.Com from crawling any folders other than the cool_stuff 
folder and its subfolders might look like:


User-agent: ISearchHereBOT	
Disallow: /
Allow: /cool_stuff/
ISearchHereBOT also obeys HTML ROBOTS meta tags with content among none, noindex, nofollow, noarchive, nosnippet. An 
example HTML page, using the noindex, nofollow directive might look as follows:


<!DOCTYPE html >
<html>
<head><title>Meta Robots Example</title>
<meta name="ROBOTS"NOINDEX,NOFOLLOW" />
<!-- The members of the content attribute must be comma separated, 
whitespace will be ignored-->
</head>
<body>
<p>Stuff robots shouldn't put in their index.
<a href="/somewhere">A link that nofollow will prevent from being followed</a></p>
</body>
</html>


ISearchHereBOT does not use Open Directory or Yahoo! Directory data, so noodp and noydir are implicitly supported. 
ISearchHereBOT matches case-insensitively. Within HTML documents it honors anchor rel="nofollow" directives. 
For example, the following link would not be followed by ISearchHereBOT:



<a href="somewhere_else" rel="nofollow">This link would not be followed by ISearchHereBOT</a>
ISearchHereBOT further understands the Crawl-delay extension to the robots.txt standard and also Sitemap directives. 
For example,

User-agent: ISearchHereBOT	
Crawl-Delay: 10 # ISearchHereBOT will wait 10 seconds between requests
Sitemap: http://www.domain.com/domainsitemap.xml #ISearchHereBOT will eventually download





ISearchHereBOT only supports uncompressed sitemaps.
For non-HTML pages, you can control how ISearchHereBOT indexes, follows links, and how ISearchHere.Com displays results 
from these pages in the ISearchHere.Com Web site by using an X-Robots-Tag HTTP header. For example, if your web server 
sent as part of its HTTP Response header before the actual page data of say a PDF file, the following

X-Robots-Tag: nosnippet

then if the PDF appeared as part of search results, then would be no snippet text under the link in the search results.
More Specifics on robots.txt and Meta Tag Handling
When processing a robots.txt file, if Disallow and Allow lines are in conflict, ISearchHereBOT gives preference to the 
Allow directive over the Disallow directive as the default behavior of robots.txt is to allow everything except what is 
explicitly disallowed.

If a webpage has a noindex meta tag, then it won't show up in search results, provided that ISearchHere.Com has actually 
downloaded the page. If ISearchHere.Com hasn't downloaded the page, or is forbidden from downloading the page by a 
robots.txt file, it is possible for a link to the page to show up in search results. This could happen if another page 
links to the given page, and ISearchHere.Com has extracted this link and its text and used them in search results. One 
can check if a URL has been downloaded by typing a query info:URL into ISearchHere.Com and seeing the results.

When processing a robots.txt file, ISearchHereBOT first looks for ISearchHereBOT User-agent blocks and extracts all of 
the Allow and Disallow paths listed in them. On success, these form the path that ISearchHereBOT will use to restrict 
its access to your site. If it cannot find any such block, it searches case-insensitively for User-Agent names which may 
contain the wildcard * which match with ISearchHereBOT's name. For example, *oop*, *Bot*, etc. It then parses all of 
these blocks and uses them to restrict its access to your site. In particular, if you have a block "User-Agent: 
*" followed by allow and disallow rules, and no blocks for ISearchHereBOT, then these paths will be what ISearchHereBOT 
uses and honors. Sitemap directives as per the Sitemap specification are not associated with any particular User-Agent. 
So ISearchHere.Com processes, to the extent that it does, any such directive it finds.In processing, Allow and Disallow 
paths prior to March, 2015, ISearchHereBOT did not understand * or $ in these paths. "*" and "$" are Google, Yahoo, and 
Bing supported extensions to the original robots.txt specification. As of March, 2015, ISearchHereBOT does understand 
these extensions. So for example, one can block access to pages on your site containing a query string by having a 
Disallow path such as:


Disallow: /*?


ISearchHere.Com makes use of the cURL libraries to download web pages. Prior to December, 2015, ISearchHere.Com used 
cURL's automatic following of redirects. This meant that ISearchHere.Com sometimes followed URL shortened links or other 
redirects to a page whose robots.txt would have denied it access. Since Feb, 2015, ISearchHere.Com does not use this 
feature of cURL and for a redirect response instead extracts a link that has to go through the same queuing and 
robots.txt checking as all other links.


How Quickly does ISearchHereBOT Change its Behavior

When ISearchHere.Com machines are crawling for longer than one day, they cache the robots.txt file. They use the cached 
directives rather than re-requesting the robots.txt file for 24 hours before making a new request of the robots.txt file 
again. So if you change your robots.txt file it might take a little while before the changes are noticed by 
ISearchHere.Com crawler.


Adding Your Site to ISearchHere.Com Search Engine

Currently you may go to the main page at http://www.ISearchHere.Com.com and select Submit URL and add your URL. There is 
a limit of 30 URLS per day.


Who owns ISearchHere.Com Search?

ISearchHere.Com is owned by Avanced Sales Force a startup company with ambition and bringing back the way searching 
should be. 



Where is ISearchHere.Com Search Located?

ISearchHere.Com is primary located in Canada. We also have servers in multiple countries. We are determined to have our 
existence in almost every country soon. ISearchHere.Com currently has multiple servers up and running globally and are 
100% owned by ISearchHere.Com


Copyright

The name ISearchHere.Com is not to be used for any other purpose. We do not allow anyone to copy our logo and or use our 
name without written permission.



How Many URL's Does ISearchHere.Com Currently Have?

ISearchHere.Com currently sees about 3.2 Billion urls. As of June 12, 2015


Contact Info

If you have any questions about ISearchHere.Com crawler, please feel free to contact us (query at ISearchHere.Com) 
For Advertising and Business Development please contact us from the provded lnks on our main page or below. 
if you want ISearchHere.Com to crawl your site, please visit our main page and click Submit URL.