Statistics
Contents
Statistics for Digital Humanities / TEI Projects
This page discusses some points related to the gathering statistics of Digital Humanities and TEI projects.
Kinds of statistics
Three are in general three reasons to produce statistics, it is important to know you want to do with your statistics before you start or you may end up begin usable to measure what you need.
- to monitor workflow and progress towards set targets. These are most often reported internally often include documents ingested in a unit time, measured error rates of OCR, server uptime, server response time, collection size, etc. They are typically tied to a workflow or service contract (real or implied) of some description.
- to find how users are using a collection. These focus on how users interact with the collection and often include most popular pages, popular search terms, referring websites, etc. They are typically used in aggregate to measure what people are interested in and how easy a site is to use.
- to motivate participants. These focus on the way individual items / pages are used and are used to motivate participants such as contributors to Open Archives.
Google Analytics
A number of TEI projects use google analytics as a tool for measuring user statistics. It uses a piece of javascipt on the bottom of every page to track users as they browse to the website and continue on their way across the web. Very good for tracking which search terms are bringing users to a site. Very good filtering-out of bots, scripts and malware. Nice PDF reports. Only sees javascript-using web browsers. Lots of features assume that the website is selling something in a monetary transaction which are thus irrelevant to most TEI projects. Sites can grant read access to third parties to look at their stats, enabling sharing and comparison of stats.
https://www.google.com/analytics/
Wikipedia
As an encyclopedia, wikipedia is almost entirely dependent on primary sources for verification; it thus accumulates links to various primary-source-containing digital humanities projects over time. Counting the number of links from such an actively curated collection to a collection of primary sources can be used as a measure of the utility of the primary sources. There is a page available for this. http://en.wikipedia.org/wiki/Special:LinkSearch Such measures are biased towards English language resources and topics with good wikipedia coverage (such as the world wars).
Adding links to your own website to wikipedia is usually considered bad form for a detailed discussion of interactions between digital humanities projects and wikipedia see http://en.wikipedia.org/wiki/Wikipedia:Advice_for_the_cultural_sector
Webalizer
A tool for generating statistics from weblogs, webalizer is perhaps one of the oldest and most staid ways of measuring statistics. Effectively creates a website out of the statistics which can be accessed in the same ways as any other website. Not particularly slick, prone to over reporting.