Bleed-Through Database
Update — April 2014
The images have been cropped to remove registration artefacts that may affect bleed-through removal.
Full colour images are now also available with the database.
Bleed-Through Database
The images contained here form a database that is designed to be a resource for people working in the field of digital document restoration, and more specifically on the problem of bleed-through degradation. It consists of a set of 25 registered recto verso sample grayscale image pairs, taken from larger manuscript images, with varied degrees of bleed-through.
The crops were taken such that they would contain a sentence or phrase of text, in order to make tests on improved legibility possible. All images in the database are saved in tif format. The verso side of each image pair has been flipped horizontally and registered to the recto side so that bleed-through text on one side and corresponding foreground text on the other are aligned.
One of the main problems encountered when researching digital document restoration is that no ground truth exists. That is there is no available clean original document with which to compare restoration results. Therefore ground truth has to be obtained either by creating synthetic degraded document images with known ground truth, or by creating synthetic ground truth images for real degraded images. This database contains the latter; manually created foreground text masks for each image have been created and are included here.
File Naming Convention for the Degraded Images
File names in the database have the format "lib"."MS"."fol".tiff
- lib - the library from which the image originates. The labels are
- AC - The Allan and Maria Myers Academic Centre, a joint facility for the students of Newman College and St Mary's College, both residential colleges of the University of Melbourne.
- FH - The Benjamin Iveagh Library, Farmleigh House, Ireland.
- NLI - The National Library of Ireland.
- NUIG - The James Hardiman Library, National University of Ireland, Galway.
- NUIM - The Russell Library, National University of Ireland, Maynooth.
- RIA - The Royal Irish Academy Library.
- TCD - Trinity College Dublin Library.
- UCD - University College Dublin Library.
- MS - the manuscript number.
- fol - the page number, or folio number followed by "r" or "v" to denote the recto or verso side.
File Naming Convention for the Ground-Truth Masks
As for the degraded images, but with appended "gt".tiff to differentiate.
File Naming Convention for Colour Images
As for the degraded images, but with appended "rgb".tiff to differentiate.
Restoration Results
Move the mouse over the links in the table to see the corresponding degraded images and restoration results.
Degraded Recto
|
Recto Ground Truth Foreground Mask
|
Degraded Verso
|
Verso Ground Truth Foreground Mask
|
Accessing the Database
All queries regarding the database should be forwarded to rowleybr_at_tcd_dot_ie.
Download the database 169MB zip.
We request that you reference the following in all publications which describe the work for which you use this database:
- Irish Script On Screen Project, www.isos.dias.ie
- R. Rowley-Brooke, F.Pitié, A. Kokaram, A ground truth bleed-through document image database. In P. Zaphiris, G. Buchanan, E. Rasmussen, and F. Loizides, editors, Theory and Practice of Digital Libraries, volume 7489 of Lecture Notes in Computer Science, pages 185-196, Springer, 2012.
References
Method | Reference |
Han | G. A. Hanasusanto, Z. Wu, and M. S. Brown. Ink-Bleed Reduction using Functional Minimization. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 825-832, 2010. |
Hua | Y. Huang, M. S. Brown, and D. Xu, User Assisted Ink-Bleed Reduction. IEEE Transactions on Image Processing, 19(10): 2646-2658, 2010. |
Mog | R. F. Moghaddam, M. Cheriet, A Variational Approach to Degraded Document Enhancement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(8): 1347-1361, 2010. |
Ro1 | R. Rowley-Brooke, A. Kokaram, Bleed-Through Removal in Degraded Documents. In C. Viard-Gaudin and R. Zanibbi, editors, Proceedings of SPIE 8297, Document Recognition and Retrieval XIX, 82970T, 2012. |
Ro2 | R. Rowley-Brooke, F. Pitié, A. Kokaram, A Non-Parametric Framework for Document Bleed-Through Removal. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 2954-2960, 2013. |