About This Project

 

 

The Dataset

 

The original dataset was released as part of the court proceedings against Enron executives a couple of years ago.  It consists of the contents of the mail folders of the top 151 executives, containing about 225,000 messages covering a period from 1997 to 2004.    Two different groups, one at USC (called ISI) and one at Berkeley, took the original text dataset released by Carnegie Mellon, and converted it to a database format.  The main difference between the two seems to be that ISI took steps to eliminate duplicate messages.  We don’t know of anything wrong with the ISI duplicate-removal methods but we don’t vouch for them either.  If you want to be sure of having access to all the messages, use the Berkeley database.  Both databases are available through our interface.  For more on the structure and creation of the original databases visit these sites:  

 

See below for additional resources and background on Enron. 

 

Please remember that per your agreement when you signed up for this site, you assume responsibility for compliance with human subjects regulations at your institution that may apply to use of this data in research. 

 

 

The Interface

 

The databases described above, though freely available, are not readily usable by the average communication researcher.  Yet the Enron dataset provides an unprecedented opportunity to study real organizational communication in a significant case.  Therefore the Organizational Communication Division of the International Communication Association (ICA) launched this project to provide a powerful and user-friendly interface to the Enron dataset. 

 

The project is led by Steve Corman at Arizona State University, and is supported by Noshir Contractor from University of Illinois, Michele Jackson from University of Colorado at Boulder, and Craig Scott from the University of Texas at Austin.  Jana Diesner from CMU provided valuable consulting on the project.

 

The database, interface, and forum are hosted by the National Center for Supercomputing Applications at the University of Illinois.  The software application developers are Andy Don and Mike O’Malley from NCSA.

 

 

The ICA Conference Connection

Researchers are encouraged to plan projects for submission to special panels on this data being organized by the Organizational Communication Division for the ICA 2006 conference in Dresden, Germany.  Project plans based on preliminary analysis of the database are due to steve.corman@asu.edu no later than October 15, 2005.  Selected projects must have final results available by January 15, 2006.

 

 

Background Information Links on Enron

 

A good review of the case and the dataset is included in this working paper by Carley & Diesner from CMU http://www.bmacewen.com/blog/pdf/Enron.Working.Paper.March.2005.pdf

 

Public Citizen “Enron Information Center  http://www.citizen.org/cmep/energy_enviro_nuclear/electricity/Enron/index.cfm

 

Enron Ex Employee Status Report http://www.isi.edu/~adibi/Enron/Enron_Employee_Status.xls.

 

            Houston Chronicle Article on Lay Indictment http://www.chron.com/cs/CDA/ssistory.mpl/front/2635540

 

            CNN Q&A on Enron Bankruptcy http://archives.cnn.com/2002/US/01/12/enron.qanda.focus/

 

            Other resources

 

Shetty, J., & Adibi, J. (n.d.). The Enron Dataset Database Schema and Brief Statistical Report. Retrieved November 4, 2004, from  http://www.isi.edu/~adibi/Enron/Enron_Dataset_Report.pdf.

 

J. Heer paper on visualizing NLP results on Enron database http://jheer.org/enron/v1/

 

References related to the dataset and NLP work on it:

Bekkerman, R. (n.d.). Retrieved November 4, 2004, from  http://www.cs.umass.edu/~ronb/

 

Cohen, W.W. (n.d). CALD, CMU. Retrieved October 5, 2004, from http://www-2.cs.cmu.edu/~enron/.

 

Papers from the Workshop on Link Analysis, Counterterrorism and Security, held at Fifth SIAM International Conference on Data Mining (SDM 2005) http://www.cs.queensu.ca/home/skill/proceedings/

 

Corrada-Emmanuel, A. (n.d.). Enron Email Dataset Research. Retrieved October 5, 2004, from http://ciir.cs.umass.edu/~corrada/enron/

see also http://www.cnlp.org/presentations/slides/Corrada_Enron.pdf

  

Klimt, B., & Yang, Y. (2004). Introducing the Enron Corpus. First Conference on Email and Anti-Spam (CEAS), Mountain View, CA. Retrieved October 14, 2004, from http://www.ceas.cc/papers-2004/168.pdf

 

Klimt, B., & Yang, Y. (2004). The Enron Corpus: A New Dataset for Email Classification Research. European Conference on Machine Learning, Pisa, Italy.