ISOS logo
ga

Bleed-Through Database

Bleed-Through Database

Update — April 2014

The images have been cropped to remove registration artefacts that may affect bleed-through removal.

Full colour images are now also available with the database.

Bleed-Through Database

The images contained here form a database that is designed to be a resource for people working in the field of digital document restoration, and more specifically on the problem of bleed-through degradation. It consists of a set of 25 registered recto verso sample grayscale image pairs, taken from larger manuscript images, with varied degrees of bleed-through.

The crops were taken such that they would contain a sentence or phrase of text, in order to make tests on improved legibility possible. All images in the database are saved in tif format. The verso side of each image pair has been flipped horizontally and registered to the recto side so that bleed-through text on one side and corresponding foreground text on the other are aligned.

One of the main problems encountered when researching digital document restoration is that no ground truth exists. That is there is no available clean original document with which to compare restoration results. Therefore ground truth has to be obtained either by creating synthetic degraded document images with known ground truth, or by creating synthetic ground truth images for real degraded images. This database contains the latter; manually created foreground text masks for each image have been created and are included here.

File Naming Convention for the Degraded Images

File names in the database have the format "lib"."MS"."fol".tiff

  • lib - the library from which the image originates. The labels are
    1. AC - The Allan and Maria Myers Academic Centre, a joint facility for the students of Newman College and St Mary's College, both residential colleges of the University of Melbourne.
    2. FH - The Benjamin Iveagh Library, Farmleigh House, Ireland.
    3. NLI - The National Library of Ireland.
    4. NUIG - The James Hardiman Library, National University of Ireland, Galway.
    5. NUIM - The Russell Library, National University of Ireland, Maynooth.
    6. RIA - The Royal Irish Academy Library.
    7. TCD - Trinity College Dublin Library.
    8. UCD - University College Dublin Library.
  • MS - the manuscript number.
  • fol - the page number, or folio number followed by "r" or "v" to denote the recto or verso side.

File Naming Convention for the Ground-Truth Masks

As for the degraded images, but with appended "gt".tiff to differentiate.

File Naming Convention for Colour Images

As for the degraded images, but with appended "rgb".tiff to differentiate.

Restoration Results

Move the mouse over the links in the table to see the corresponding degraded images and restoration results.

Image Restoration Method Image Restoration Method Image Restoration Method
Han Hua Mog Ro1 Ro2 Han Hua Mog Ro1 Ro2 Han Hua Mog Ro1 Ro2
AC.MSBMMM.90r/v Han Hua Mog Ro1 Ro2 RIA.MSCiii3.301r/v Han Hua Mog Ro1 Ro2 UCD.MSA13.xxv/i Han Hua Mog Ro1 Ro2
FH.IP.7r/v Han Hua Mog Ro1 Ro2 RIA.MSCiii3.420r/v Han Hua Mog Ro1 Ro2 UCD.MSA15.5/6 Han Hua Mog Ro1 Ro2
NLI.MSG18.147/8 Han Hua Mog Ro1 Ro2 TCD.1436.81/2 Han Hua Mog Ro1 Ro2 UCD.MSA15.37/8 Han Hua Mog Ro1 Ro2
NLI.MSG18.361/2 Han Hua Mog Ro1 Ro2 TCD.MS1333.9/10 Han Hua Mog Ro1 Ro2 UCD.MSA20.127r/v Han Hua Mog Ro1 Ro2
NLI.MSG311.265/6 Han Hua Mog Ro1 Ro2 TCD.MS1343.59/60 Han Hua Mog Ro1 Ro2 UCD.MSA29.12r/v Han Hua Mog Ro1 Ro2
NUIG.LSdh18.153/4 Han Hua Mog Ro1 Ro2 TCD.MS1435.147/8 Han Hua Mog Ro1 Ro2 UCD.MSA29.119r/v Han Hua Mog Ro1 Ro2
NUIG.LSdh18.155/6 Han Hua Mog Ro1 Ro2 UCD.AddIM14.386/7 Han Hua Mog Ro1 Ro2 UCD.MSA29.121r/v Han Hua Mog Ro1 Ro2
NUIM.MSM86.13/4 Han Hua Mog Ro1 Ro2 UCD.AddIM14.726/7 Han Hua Mog Ro1 Ro2 UCD.MSA33.87/8 Han Hua Mog Ro1 Ro2
NUIM.MSR68.80/1 Han Hua Mog Ro1 Ro2
Degraded Recto
Recto Ground Truth Foreground Mask
Degraded Verso
Verso Ground Truth Foreground Mask

Accessing the Database

All queries regarding the database should be forwarded to rowleybr_at_tcd_dot_ie.

Download the database 169MB zip.

We request that you reference the following in all publications which describe the work for which you use this database:

  1. Irish Script On Screen Project, www.isos.dias.ie
  2. R. Rowley-Brooke, F.Pitié, A. Kokaram, A ground truth bleed-through document image database. In P. Zaphiris, G. Buchanan, E. Rasmussen, and F. Loizides, editors, Theory and Practice of Digital Libraries, volume 7489 of Lecture Notes in Computer Science, pages 185-196, Springer, 2012.

References

Method Reference
Han G. A. Hanasusanto, Z. Wu, and M. S. Brown. Ink-Bleed Reduction using Functional Minimization. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 825-832, 2010.
Hua Y. Huang, M. S. Brown, and D. Xu, User Assisted Ink-Bleed Reduction. IEEE Transactions on Image Processing, 19(10): 2646-2658, 2010.
Mog R. F. Moghaddam, M. Cheriet, A Variational Approach to Degraded Document Enhancement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(8): 1347-1361, 2010.
Ro1 R. Rowley-Brooke, A. Kokaram, Bleed-Through Removal in Degraded Documents. In C. Viard-Gaudin and R. Zanibbi, editors, Proceedings of SPIE 8297, Document Recognition and Retrieval XIX, 82970T, 2012.
Ro2 R. Rowley-Brooke, F. Pitié, A. Kokaram, A Non-Parametric Framework for Document Bleed-Through Removal. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 2954-2960, 2013.