Tag Archives: Jeffrey Solka

Mining new depths in scholarship, part 1

I examine the opening chapter by Edward Wegman and Jeffrey Solka in the 2005 Handbook of Statistics: Data Mining and Data Visualization (C Rao, E Wegman and J Solka, editors).  Sections 3 (The Computer Science Roots of Data mining ), 5 (Databases), 6.2 ( Clustering) and 6.3 (Artificial Neural Networks) appear to be largely derived from unattributed antecedents; these include online tutorials and presentations on data mining, SQL and artificial neural networks, as well as Brian Everitt’s classic Cluster Analysis. All the identified passages, tables and figures were adapted from “copy-paste” material in earlier course lectures by Wegman. The introduction to Chapter 13 (on genetic algorithms) by Yasmin Said also appears to contain lightly edited material from unattributed sources, including an online FAQ on evolutionary computing and a John Holland Scientific American piece. Several errors introduced by editing and rearrangement of the  material are identified, demonstrating the authors’ lack of familiarity  with these particular subject areas. This extends a pattern of problematic scholarship previously noted in the work of Wegman and Said.

Continue reading


Wegman and Said 2011: Yet More Dubious Scholarship in Full Colour, part 1

Previous posts have examined scholarship issues in the Wegman Report and  Wegman et al’s core flawed statistical analysis of the “hockey stick” graph. Now I show that a recent WIREs Computational Statistics overview article on colour theory and design by Edward Wegman and protege Yasmin Said is based mainly on unattributed “flow through” decade-old material from various websites. These have been augmented by further unattributed figures and text from current online sources, including five Wikipedia articles (see figure above right).

The first anniversary of “hockey stick” co-author Ray Bradley’s complaint against George Mason University statistics professor Edward Wegman has come and gone, but the ensuing proceeding at GMU shows no sign of resolution. Similarly absent is any indication of the release of code and data, promised by Wegman back in 2006, nor an explanation for the obvious problems permeating the Wegman Report’s core statistical analysis.

But through it all there has been one obvious question: if the Wegman Report and the follow up federally funded Said et al  on co-author social networks showed clear evidence of cut-and-paste scholarship, what might a close examination of other recent (or even not so recent) scholarship from the Wegman group reveal? To be sure, there already hints at the answer seen in problems in PhD dissertations from Said and others at GMU, and the insertion of a couple of paragraphs from the PhD dissertation of computer scientist David Grossman into a Wegman et al’s 1996 technical report.

A recent article by Wegman and Said in WIREs Computational Statistics opens up a whole new avenue of inquiry – and reveals a remarkable pattern of “flow through” cut-and-paste that goes even beyond Said et al 2008. Colour Design and Theory (published online in February) is based largely on a 2002 course lecture by Wegman. However, this is no case of simple recycling of material, for most of the earlier lecture material came from obscure websites on colour theory and was simply copied verbatim without attribution. Now much of it has shown up, virtually unchanged, nine years later. And the old material has been augmented with figures and text from several more decidedly non-scholarly sources, including – wait for it – five different Wikipedia articles.

Continue reading