| week selector | S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|---|
go to week of May 1, 2011![]() | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
go to week of May 8, 2011![]() | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
go to week of May 15, 2011![]() | 15 | 16 | 17 | 18 | 19 | 20 | 21 |
go to week of May 22, 2011![]() | 22 | 23 | 24 | 25 | 26 | 27 | 28 |
go to week of May 29, 2011![]() | 29 | 30 | 31 | 1 | 2 | 3 | 4 |
Abstract: A significant portion of recorded history is in the form of paper. Digitization takes these paper documents and turns them into digital images. While doing this gains us the ability to duplicate and easily share that data, benefits we have come to expect from any data being digital, what many have failed to realize is that by having these documents as digital images we end up losing perhaps the most important aspect of digital data, that of large scale search. To a computer images are nothing more than matrices of numbers. While it is second nature for humans to see patterns amongst these numbers, such as characters and shapes, for a computer this is usually a highly error prone and computationally costly task. I will talk about our efforts sponsored the Division of Applied Research at the National Archives to provide a form of automated search for such data. I will briefly describe an image based search technique called Word Spotting to search for handwritten text within scanned forms and how we will use passive crowd sourcing to improve accuracy of returned results through usage of the system. In addition I will describe the significant computation required to provide automated search for one such archive, the 1940s Census containing roughly 3.9 scanned forms.
Pizza will be provided!