In 2024, ChatGPT, Claude, and Google’s Gemini rolled out search capabilities across their products, allowing LLMs to retrieve information and use them in responses. Now, users (and prospective site-visitors) are provided satisfying and relevant answers without ever visiting another website. Since then, tracking how AI platforms crawl and retrieve your content stays as difficult as ever. With no first-party tracking options, tracking your brand’s visibility across AI platforms remains a challenge.
Unlike Googlebot, which has been studied and reverse-engineered for the past two decades, AI crawlers operate under a different set of incentives, with much less transparency in its mechanisms. This is where server log files can act as a source of truth and give some insight into some relatively opaque processes.
What are log files and how can they help track AI visibility?
Log files are raw records of any visit to a website. Any time anything accesses a website (whether its a human using a web browser, Googlebot, or any AI crawlers), a line is written on the log file, which usually includes the following details:
- The user’s IP address
- The requested URL
- The timestamp (date and time of the visit)
- The user-agent (which identifies the browser or bot)
For AI visibility tracking, log files are uniquely valuable because AI platforms do not provide first-party reporting or webmaster tools equivalent to Google Search Console. Log files act as a source of truth for understanding how AI systems interact with your website. While they do not show how content is ultimately presented in AI-generated answers, they provide the strongest available signal of when and why AI platforms are engaging with your content.
The three types of AI crawlers and what we can learn from them
There are three main types of AI crawler bots you’ll see on your log files: training bots, search bots, and user bots. Each of these bots access your sites for fundamentally different purposes, have different crawling patterns and offer different insights. Understanding the differences between these bots and the questions they can answer is crucial to informing future strategy decisions around crawl management, content optimisation and can shed some light on what LLM users are searching as it relates to your site and the intent behind them.
AI training bots
AI training bots crawl your site to scrape content that may be used for embedding generation and knowledge base generation (which is the database of information that LLMs use when answering questions without web search). While Googlebot takes into account contextual signals like internal linking and quality indicators, preliminary analysis shows that training bots crawl almost any page they can find, making up almost 80% of AI bot activity. Training bots do not directly relate to user action but they do provide information about how your brand is represented across AI platforms. Additionally, it is important to note that most of the collected data from these bots may not be used in responses, as content scraped through this method goes through a rigorous data-cleaning pipeline. Here are some key insights we can gain from analysing training bot visits:
- How frequently AI platforms source your website for training
- We’ve seen that LLMs can mention and source brands without the web search feature enabled, which means that this information comes from past training.
- Frequent visits by training bots may mean some information about your brand will be added to LLM knowledge bases.
- What content on your site attracts the most AI interest
- This can inform content strategies – focus content development on high-performing content clusters.
- Exposes resource strain or crawl inefficiencies
- How exposed your site is to large scale data harvesting
AI search bots
AI search bots serve to fetch up-to-date information to satisfy live user queries. Search bots are system-initiated, operating autonomously, whether or not a user has specifically requested a web search. Search bots are used when the AI systems deems it necessary to verify or refresh information, so hits from these bots do not necessarily equate to a human impression on AI generated answers. From these bots we can learn:
- How often AI systems pull your content for immediate question answering
- Which URLs are relied upon for factual, informational lookups
- High retrieval frequency here may signal that certain pages are seen as an authoritative source by AI engines
- How your site is being used to power third-party AI experiences
- Whether content updates are being seen quickly by AI platforms
AI user bots
AI user bots reflect traffic initiated by users within AI platforms. As opposed to AI search bots, which are system-initiated, these bots signal some kind of demand to answer a question from a human. However, they do not equate to a human visit, nor an explicit citation. Nevertheless, analysing user bot hits can be especially useful, since it reflects real human demand and can inform strategy around content creation and prioritisation of AI optimisation across your site. Here are the main insights that analysing user bot hits provides:
- What type of content users are actively attempting to access through AI tools
- How user-driven traffic differs from automated system crawling
- We’ve seen that AI user bot hits decline in weekends and in night time, reflecting human schedules
- The role of your website in a user’s search journeys
- Is informational content mainly being crawled? Or are your service pages being crawled? These data points can show where the average user journey lies in the marketing funnel when your website is being crawled.
- Opportunities for content formats that better serve AI-drive consumption patterns
Log file insights & opportunities
When analysed holistically, AI crawler data in log files can reveal broader patterns and opportunities beyond individual bot types. By combining crawl behaviour, URL patterns, and timing signals, log analysis can help you understand how AI systems interpret your site structure, content hierarchy, and technical setup and where improvements can be made to increase visibility across AI-driven experiences. Below are some general processes to keep in mind when analysing log files:
Find patterns in crawl distribution
Analysing which types of AI bots are visiting your site, and on which platforms, provides critical insight into how your content is being evaluated and consumed.
- Some bots may be focused on specific topics or content formats, revealing which pages your site is considered most valuable for AI knowledge ingestion.
- Identifying patterns in crawl frequency, timing, and page prioritisation can highlight which areas of your site are most visible to AI systems, and which content might need optimisation to be more relevant or discoverable.
Find performance gaps through comparison with organic search performance
Comparing AI crawl data with your existing performance metrics helps uncover gaps where content is being ingested but not effectively driving AI engagement.
- Pages that are heavily crawled but generate limited AI-attributed traffic may require clearer structure, more precise intent alignment, or better internal linking.
- Pages with strong engagement but minimal crawl activity may rely on older signals rather than being actively prioritised by AI, signalling opportunities to refresh or optimise them for AI understanding.
