Salt Lake City, Utah
June 20, 2004
June 20, 2004
June 23, 2004
9.368.1 - 9.368.11
Data Warehousing from the Web Chris Fernandes and Michael Whalen Department of Computer Science Union College Schenectady, NY 12308
Data warehousing is the ability to collect information from various data repositories and combine them into a single structured repository that can be queried for new information such as performance trends, decision modeling, predictions, and association rules. Internet web sites are data repositories containing useful but unstructured data. In this paper, we describe a data warehouse, developed from the registration web pages at Union College, which allows faculty and students to get on-line access to course enrollment trends, classroom availability, student class schedules, and other pertinent information. The results of this project were so successful in the type of information that could be obtained that the administration became concerned about student privacy issues.
Traditional database systems, such as those used by bank tellers, librarians, and airline reservation assistants, are often characterized as online transaction processing (OLTP) systems. They are required to process frequent queries, usually in real-time, that request information about the current status of specific objects and events, such as a bank account balance or availability of a library book. As the status of these entities change, an OLTP system must update the database to reflect these changes so that the database always represents a snapshot of the current state of the world. On the other hand, an online analytical processing (OLAP) system is a database that keeps track of historical data and processes more complicated queries involving summaries and trends rather than individual entities. Table 1 summarizes the two systems.
A data warehouse is a common OLAP system in use today. Retail stores use them to keep track of buying trends. This enables them to stock inventory more accurately. The National Basketball Association uses a system called Advanced Scout to record details about games and extract patterns such as the effectiveness of certain players when on the court with another given player 1. In general, the creation of the data warehouse is a crucial first step in data mining—the process of extracting useful associations to facilitate managerial decision-making.
In recent years, the Web has become a popular source from which to form a data warehouse. It contains a great deal of easily accessible raw data with its main drawback being that it is unstructured. Creating a warehouse from a subset of it would solve that problem and permit analytical queries to be issued upon it. Proceedings of the 2004 American Society for Engineering Education Annual Conference & Exposition Copyright © 2004, American Society for Engineering Education
Whalen, M., & Fernandes, C. (2004, June), Data Warehousing From The Web Paper presented at 2004 Annual Conference, Salt Lake City, Utah. 10.18260/1-2--13935
ASEE holds the copyright on this document. It may be read by the public free of charge. Authors may archive their work on personal websites or in institutional repositories with the following citation: © 2004 American Society for Engineering Education. Other scholars may excerpt or quote from these materials with the same citation. When excerpting or quoting from Conference Proceedings, authors should, in addition to noting the ASEE copyright, list all the original authors and their institutions and name the host city of the conference. - Last updated April 1, 2015