Case Study 1:Should a Computer Grade Your Essays?
Would you like your college essays graded by a computer? Well, you just might find that happening in your next course. In April 2013, EdX, a Harvard/MIT joint venture to develop massively open online courses (MOOCs), launched an essay-scoring program. Using artificial intelligence technology, essays and short answers are immediately scored and feedback tendered, allowing students to revise, resubmit, and improve their grade as many times as necessary. The non-profit organization is offering the software free to any institution that wants to use it. From a pedagogical standpoint—if the guidance is sound—immediate feedback and the ability to directly act on it is an optimal learning environment. But while proponents trumpet automated essay grading’s superiority to students waiting days or weeks for returned papers— which they may or may not have the opportunity to revise—as well as the time-saving benefit for instructors, critics doubt that humans can be replaced.
In 2012, Les Perelman, the former director of writing at MIT, countered a paper touting the proficiency of automated essay scoring (AES) software. University of Akron College of Education dean, Mark Shermis, and co-author, data scientist Ben Hamner used AES programs from nine companies, including Pearson and McGraw-Hill, to rescore over 16,000 middle and high school essays from six different state standardized tests. Their Hewlett Foundation sponsored study found that machine scoring closely tracked human grading, and in some cases, produced a more accurate grade. Perelman, however, found that no direct statistical comparison between the human graders and the programs was performed. While Shermis concedes that regression analysis was not performed—because the software companies imposed this condition in order to allow him and Hamner to test their products—he unsurprisingly accuses Perelman of evaluating their work without performing research of his own.
Perelman has in fact conducted studies on the Electronic Essay Rater (e-rater) developed by the Educational Testing Service (ETS)—the only organization that would allow him access. The e-rater uses syntactic variety, discourse structure (like PEG) and content analysis (like IEA) and is based on natural language processing technology. It applies statistical analysis to linguistic features like argument formation and syntactic variety to determine scores, but also gives weight to vocabulary and topical content. In the month granted him, Perelman analyzed the algorithms and toyed with the e-Rater, confirming his prior critiques. The major problem with AES programs (so far) is that they cannot distinguish fact from fiction. For example, in response to an essay prompt about the causes for the steep rise in the cost of higher education, Perelman wrote that the main driver was greedy teaching assistants whose salaries were six times that of college presidents with exorbitant benefits packages including South Seas vacations, private jets, and movie contracts. He supplemented the argument with a line from Allen Ginsberg’s “Howl,” and received the top score of 6. The metrics that merited this score included overall length, paragraph length, number of words per sentence, word length, and the use of conjunctive adverbs such as “however” and “moreover.” Since computer programs cannot divine meaning, essay length is a proxy for writing fluency, conjunctive adverb use for complex thinking, and big words for vocabulary aptitude.
Program vendors such as Pearson and Vantage Learning defend these parameters, asserting that they are highly correlated. Good writers have acquired skills that enable them to write more under time constraints; they use more complex vocabulary, and they understand how to introduce, interrupt, connect, and conclude complex ideas—the jobs of conjunctive adverbs. AES programs also recognize sentence fragments and dock students for sentences that begin with “and” or “or.” However, professional writers know how to employ both to great effect. Perelman and a newly formed group of educators, Professionals Against Machine Scoring of Student Essays in High-Stakes Assessment, warn that writing instruction will be dumbed down to meet the limited and rigid metrics machines are capable of measuring.
The productivity gains from using automated essay-grading software will undoubtedly take away some of the jobs of the graders hired by the standardized test companies. Pearson, for example, ostensibly pays its graders between $40 and $60 per hour. In that hour, a grader expected to score between 20 and 30 essays—that is two to three minutes (and dollars) per essay. Clearly graders must use some type of shorthand metrics in order to score this quickly, but at least they can recognize as false the statement that on July 4, 2013, the United States observed its 2,013th birthday, even if it is contained in a well-constructed sentence. While the e-Rater can score 16,000 essays in 20 seconds, it cannot make this distinction. In addition, presumably, a 716-word essay containing multiple nonsense sentences will not receive a 6 from a human grader while a 150-word shorter, factual, well-reasoned essay scores a 5, as Perelman was able to demonstrate.
ETS, developer of the SAT, GRE, Praxis, and K-12 standardized tests for multiple states, counters that the e-Rater is not replacing human graders in high stakes tests; it is supplementing them. Essays are scored by both human and machine and when the scores do not match, a second human breaks the impasse. Furthermore, they posit that the test prep course Perelman developed to teach students how to beat AES software requires higher-order thinking skills—precisely those the tests seek to measure. Thus, if students can master Perelman’s techniques, they have likely earned their 6. Pearson adds that its Intelligent Essay Assessor is primarily a classroom tool, allowing students to revise their essays multiple times before turning them in to a teacher to be graded. However, for many states looking to introduce writing sections to their battery of K-12 standardized tests, and for those that abandoned the effort due to the cost, eliminating graders altogether will make them affordable. In addition, the stakes are not insubstantial for failure to achieve passing grades on state standardized tests, ranging from retesting, to remedial programs, to summer school, to non-promotion.
ETS, developer of the SAT, GRE, Praxis, and K-12 standardized tests for multiple states, counters that the e-Rater is not replacing human graders in high stakes tests; it is supplementing them. Essays are scored by both human and machine and when the scores do not match, a second human breaks the impasse. Furthermore, they posit that the test prep course Perelman developed to teach students how to beat AES software requires higher-order thinking skills—precisely those the tests seek to measure. Thus, if students can master Perelman’s techniques, they have likely earned their 6. Pearson adds that its Intelligent Essay Assessor is primarily a classroom tool, allowing students to revise their essays multiple times before turning them in to a teacher to be graded. However, for many states looking to introduce writing sections to their battery of K-12 standardized tests, and for those that abandoned the effort due to the cost, eliminating graders altogether will make them affordable. In addition, the stakes are not insubstantial for failure to achieve passing grades on state standardized tests, ranging from retesting, to remedial programs, to summer school, to non-promotion. In addition, that provides immediate guidance, is a welcome addition to the instructional toolbox. However, as demands on instructor’s time decrease, will university administrators push staff cutbacks to meet budgetary constraints? Will fewer and fewer instructors be teaching more and more students?
As MOOC and AES proliferate, the answer is: most likely. EdX is quickly becoming controversial in academic circles. Presently, its course offerings are free and students earn a certificate of completion, but not course credit. To become self-sustaining, however, the non-profit plans to offer its MOOC platform as a “self-service” system, which faculty members can use to develop courses specifically branded for their universities. EdX will then receive the first $50,000 in revenue generated from the course or $10,000 for a recurring course. Thereafter, revenue will be split 50-50 between the university and EdX. A second revenue-generating model offers universities “production help” with course development, charging them $250,000 for a new course and $50,000 each term the course is offered again. If a course is successful, the university receives 70% of the revenue, as long as EdX has been fully compensated for any self-service courses. However, in order to generate enough revenue to share with its 12 university partners, which now include University of California, Berkeley, Wellesley, Georgetown, and the University of Texas, a licensing model is likely. Tested at no charge at San Jose State University in 2012, an EdX MOOC served as the basis for a blended online engineering course. The enriched curriculum resulted in an increased passing rate from 60% to 91 %. If course licensing becomes the key revenue stream, Anant Agarwal, the electrical engineer president of EdX, foresees this happening in closed classrooms with limited enrollment.
But some members of the San Jose State faculty are nonetheless alarmed. When a second EdX MOOC, JusticeX, was considered, the Philosophy department sent a sharply-worded letter addressed to Harvard course developer, Michael Sandel, but actually leveled at university administrators. Asserting that the department did not have an academic problem in need of remediation and was not lacking faculty to teach its equivalent course, it did not shy from attacking the economic motives behind public universities’ embrace of MOOCs. The authors further asserted that MOOCs represented a decline in educational quality and noted the irony involved when a social justice course was the vehicle for perpetrating a social injustice—a long-term effort to “dismantle departments and replace professors.” Sandel’s conciliatory response expressed his desire to share free educational resources, his aversion to undercutting colleagues, and a call for a serious debate at both EdX and in the higher education community.
Other universities are similarly pushing back, against both EdX and other new MOOC ventures such as Coursera and Udacity, founded by Stanford faculty members. MOOCs and AES are inextricably linked. Massive online courses require automated assessment systems. In addition, both Coursera and Udacity have expressed their commitment to using them due to the value of immediate feedback. Amherst College faculty voted against joining the EdX consortium. Duke University faculty members thwarted administration attempts to join nine other universities and educational technology company 2U in a venture to develop a collection of for-credit undergraduate courses.
However, EdX was founded by two of the most prominent universities in the United States, has gathered prestigious partners, and is already shaping educational standards. Stanford, for one, has decided to get on board; it adopted the OpenEdX open-source platform and began offering a summer reading program for freshman and two public courses in the summer of 2013. Stanford will collaborate with EdX on the future development of OpenEdX and will offer both public and university classes on it.
Therefore, while Professor Perelman jokes that his former computer science major students could develop an Android app capable of spitting out formulaic essays that would get a 6 from e-Rater, cutting humans completely out of the equation, he knows that serious issues are in play. What educational outcomes will result from diminishing human interaction and input? Will AI develop to the point that truth, accuracy, effective organization, persuasiveness, argumentation and supporting evidence can be evaluated? And how many more jobs in education will disappear as a result?
Case Study 1: Should a Computer Grade Your Essays?
1)Identify the kinds of systems described in this case. (1 Mark)
2)What are the benefits of automated essay grading? What are the drawbacks? (1 Mark)
3)What management, organization, and technology factor should be considered when deciding whether to use AES? (1 Mark)
Case Study 2: American Water Keeps Data Flowing
American Water, founded in 1886, is the largest public water utility in the United States. Headquartered in Voorhees, N.J., the company employs more than 7,000 dedicated professionals who provide drinking water, wastewater and other related services to approximately 16 million people in 35 states, as well as Ontario and Manitoba, Canada. Most of American Water’s services support locally managed utility subsidiaries that are regulated by the U.S. state in which each operates as well as the federal government. American Water also owns subsidiaries that manage municipal drinking water and wastewater systems under contract and others that supply businesses and residential communities with water management products and services.
Until recently, American water’s systems and business, processes were much localized, and many of these processes were manual. Over time, this information environment became increasingly difficult to manage. Many systems were not integrated, so that running any type of report that had to provide information about more than one region was a heavily manual process. Data had to be extracted from the systems supporting each region and then combined manually to create the desired output. When the company was preparing to hold an initial public offering of its stock in 2006, its software systems could not handle the required regulatory controls, so roughly 80 percent of this work had to be performed manually. It was close to a nightmare.
Management wanted to change the company from a decentralized group of independent regional businesses into a more centralized organization with standard company-wide business processes and enterprise-wide reporting. The first step toward achieving this goal was to implement an enterprise resource planning (ERP) system designed to replace disparate systems with a single integrated software platform. The company selected SAP as its ERP system vendor.
An important step of this project was to migrate the data from American Water’s old systems to the new platform. The company’s data resided in many different systems in various formats. Each regional business maintained some of its own data in its own systems, and a portion of these data was redundant and inconsistent. For example, there were duplicate pieces of materials master data because a material might be called one thing in the company’s Missouri operation and another in its New Jersey business. These names had to be standardized so that every business unit used the same name for a piece of data. American Water’s business users had to buy into this new company-wide view of data.
Data migration entails much more than just transferring data between old and new systems. Business users need to know that data are not just a responsibility of the information systems department: the business “owns” the data. Business needs determine the rules and standards for managing the data. Therefore, it is up to business users to inventory and review all the pieces of data in their systems to determine precisely which pieces of data from the old system will be used in the new system and which data do not need to be brought over. The data also need to be reviewed to make sure they are accurate and consistent and that redundant data are eliminated.
Most likely some type of data cleansing will be required. For example, American Water had data on more than 70,000 vendors in its vendor master data file. Andrew Clarkson, American Water’s Business Intelligence Lead, asked business users to define an active vendor and to use that definition to identify which data to migrate. He also worked with various functional groups to standardize how to present address data.
One of the objectives of American Water’s data management work was to support an enterprise wide business intelligence program based on a single view of the business. An analytical system and data warehouse would be able to combine data from the SAP ERP System with data from other sources, including new customer information and enterprise asset management systems. That meant that American Water’s business users had to do a lot of thinking about the kinds of reports they wanted. The company had originally planned to have the system provide 200 reports, but later reduced that number by half. Business users were trained to generate these reports and customize them. Most financial users initially tried to create their reports using Microsoft Excel spreadsheet software. Over time, however, they learned to do the same thing using SAP Business Objects Web Intelligence tools that came with the system. SAP Business Objects Web Intelligence is a set of tools that enables business users to view, sort, and analyze business intelligence data. It includes tools for generating queries, reports and interactive dashboards.
At present, American Water is focusing on promoting the idea that data must be “clean” to be effective and has poured an incredible amount of effort into its data cleansing work—identifying incomplete, incorrect, inaccurate, and irrelevant pieces of data and then replacing, modifying, or deleting the “dirty” data. According to Clarkson, just as water treatment plants have measurements and meters to check water quality as its being treated, data management needs to ensure the quality of data at every step to make sure the final product will be genuinely useful for the company.
Case Study 2: American Water Keeps Data Flowing
- How did implementing a data warehouse help American Water move toward a more centralized organization? (1 Mark)
- Give some examples of problems that would have occurred at American Water if its data were not “clean”? (1 Mark)
- How did American Water’s data warehouse improve operations and management decision making? (1 Mark)