Search

Scholarly Works (4 results)

Sort By:

Article

Data Analysis Activities and Problems for the Computer Science Major in a Post-calculus Introductory Statistics Course

Juana Sanchez

Other Recent Work (2011)

Cover page: Data Analysis Activities and Problems for the Computer Science Major in a Post-calculus Introductory Statistics Course

Article

The Millennium goals, National Statistical Offices, the International Statistical Literacy Project and Statistical Literacy in schools

Juana Sanchez

Department of Statistics Papers (2011)

Reaching the United Nations’ Millenium Development Goals will become more feasible if there is growth in the number literacy programs in National Statistical Literacy Offices and if the communication between the latter and statistics educators increases. There exist examples of National Statistical Offices working together with statistics educators to create resources for teachers. We compare them in this paper. But there are currently only limited ways in which the knowledge of these resources is disseminated, either nationally or internationally. The ISLP has a major role to play in helping achieve wider dissemination of these resources.

Cover page: The Millennium goals, National Statistical Offices, the International Statistical Literacy Project and Statistical Literacy in schools

Article

Internet Data Analysis for the Undergraduate Statistics Curriculum

Department of Statistics Papers (2011)

Statistics textbooks for undergraduates have not caught up with the enormous amount of analysis of Internet data that is taking place these days.

Cover page: Internet Data Analysis for the Undergraduate Statistics Curriculum

Article

Bayesian Hierarchical Model of the Browsing Behavior of World Wide Web Users

Department of Statistics Papers (2011)

We consider the case of surfing within a single large Web site, which is important from the point of view of site design, web server proxy efficiency and search engine optimal ranking of pages. The site used as an example to illustrate a method for clustering user sessions that we propose is msnbc.com. We use a random sample from a publicly available server log data on the Web pages chosen by 989818 users in a twenty-five hour period, where the response measure for each user is an ordered sequence of choices among 17 categories (UCI KDD Archive). A common way to model the browsing behavior of users is to assume that the decision of users is a random walk with a probability distribution of first passage time to a threshold that is a two-parameter inverse-gaussian distribution. Another hypothesis examined in the literature is that users at each page conduct an independent Bernoulli trial to make a stopping decision, which implies a geometric distribution. Mixtures of first-order Markov processes or model-based clustering with and without a Bayesian flavor have o ered very useful exploratory data analysis. All these studies have shown evidence that web-surfing behavior may be non-Markov in nature and have illustrated how hard it is to capture dependencies in the data. The performance of the models over a wide range of Web Site formats is still inconclusive. This performance has been measured by the ability to predict page hits, by the resulting distribution of page hits, and by the contribution to e cient web caching schemes. Some models have been tested with server log data of AOL or similar Sites and others have been tested within a single Web site like msnbc.com. The levels of aggregation of pages and clustering of user behavior have also varied within studies. In this paper, we assume that for the case of browsing within a news portal like msnbc.com, where contents are continually changing, the server-log data is only meaningful when categories are aggregated, like they are for the msnbc.com data set, and the order of the browsing may not be relevant. We use a Bayesian hierarchical model of the page counts per user to obtain posterior distributions of page access frequency that allow us to cluster user sessions in a relatively small number of groups. The model has the ability to have enough parameters to fit the data well, while using a population distribution that can structure dependence in the parameters. The model can be generalized to different types of Web sites, di erent levels of aggregation of pages and different clustering schemes.

Cover page: Bayesian Hierarchical Model of the Browsing Behavior of World Wide Web Users