Ziff Davis's Study Reveals That LLMs Favor High DA Websites

- Encontrar enMiembros
- Encontrar enVideos
- Encontrar enCanales

This website uses cookies to ensure you get the best experience on our website.

To learn more about our privacy policy haga clic aquí

Blogs Inicio » General » Ziff Davis's Study Reveals That LLMs Favor High DA Websites

Manomita Mandal

Ziff Davis's Study Reveals That LLMs Favor High DA Websites

Etiquetas - #seo
- Última actualización 5 de feb.
Kolkata, West Bengal, India - Obtener las direcciones

Related Blogs

Do Queensland Real...

0 comentarios, 0 likes
How Do I Create Du...

0 comentarios, 0 likes
Luxury and Conveni...

0 comentarios, 0 likes

compartir social

Ziff Davis's Study Reveals That LLMs Favor High DA Websites

Publicado por Manomita Mandal 5 de feb.

Cuerpo

For years, SEOs have relied on Domain Authority (DA) as a benchmark for assessing a website’s authority. While Moz has consistently stated that DA is not a Google ranking factor, the metric has remained a key point of discussion in the industry.

New research from Ziff Davis sheds more light on how Domain Authority correlates with LLM content preferences, suggesting that the future might not be so different from the present.

Why did Ziff Davis conduct this study?

Ziff Davis, a major publisher with brands like PCMag, Mashable, IGN, and Moz, faces the same challenges as other media companies. They suspect that Large Language Models (LLMs) are training on their content without licensing agreements. Hence, it’s difficult to determine which content is being favored.

The study set out to address this issue. Researchers analyzed datasets like Common Crawl, C4, OpenWebText, and OpenWebText2 to understand how LLMs are trained, what types of content they prefer, and how these choices influence AI behavior and output.

You can read the full study report here.

Key takeaways from the Ziff Davis LLM Study

If you want to skip the rest of the article, I’ve summarized the key findings below:

LLMs place a high weighting on heavily-curated, high-quality datasets above other raw web data
Authoritative publishers dominate these curated datasets
OpenWebText and OpenWebText2 feature a much higher proportion of high-DA content compared to uncurated datasets
LLM developers prioritize commercial publisher content, reflecting a preference for quality and credibility

Which datasets were analyzed?

The Ziff Davis study examined four key datasets that are crucial in training large language models:

Common Crawl: An uncurated repository of web text scraped from the entire internet with minimal quality control.
C4: A cleaned version of Common Crawl that focuses on English pages and excludes duplicates and low-quality text. It offers a more refined dataset without strict curation.
OpenWebText: A proxy for OpenAI’s WebText, emphasizing high-quality content linked from Reddit with a minimum upvote threshold.
OpenWebText2: A follow-up to OpenWebText featuring an expanded and updated dataset while maintaining the same quality-focused approach.

It’s important to note that these datasets aren’t created equal. More curated datasets, like OpenWebText and OpenWebText2, contain a higher proportion of authoritative content, while unfiltered sources like Common Crawl pull from a much wider but lower-quality pool of web pages. The difference in dataset impacts how LLMs learn and generate responses.

How were publishers chosen for the study?

The study used Comscore’s web traffic to determine which publishers to analyze. Researchers focused on the top 15 portfolio publishers in the Media category as of August 2020, representing the most widely visited news and media organizations.

Comentarios

0 comentarios

More in Politics

Related Blogs

Archivo

compartir social

Ziff Davis's Study Reveals That LLMs Favor High DA Websites

Cuerpo

Why did Ziff Davis conduct this study?

Key takeaways from the Ziff Davis LLM Study

Which datasets were analyzed?

How were publishers chosen for the study?

Comentarios

Fotos

Mapa

Información sobre la ubicación