Case Study

General Population and Housing Census of Iran 2006

 

Introduction

General population and housing census is one of the biggest data capture projects in the country which run once every ten years. Different information such as gender, literacy status, economic activity, marital status, housing characteristics, etc are taken through this process. It is obvious that the accuracy of collected information and the accuracy of data input system, have a positive affect on country high level plans.

Problem

Since there are almost 18,000,000 families in Iran, collecting data about them is one of the biggest problems which SCI (Statistical Center of Iran) is involved in. After many researches have been made within years 2004 – 2005 on data entry systems, finally SCI chose ICR technology and HODA iReadDoc was selected for the project.

HODA Solution for General Population and Housing Census

As SCI requested HODA for its HODA iReadDoc product to extract data from handwritten census forms, HODA started to design required forms which contain all the information necessary in enumeration. After several meeting and negotiations between SCI and HODA experts, required forms are designed and printed in huge amounts. This is a description of these forms:

Form description

Dimension(cm)

Fields

Image

Form2: Short form questionnaire which is completed for all households of the country contains general family and building information.

42 * 29.7

(double sides)

316

 

Form3: Long-form questionnaire which is completed for some percentage of the households and includes general and detailed family and building information.

65*29.5

(double sides)

556

 

Form4: General information about organizational households

65*29.7

(double sides)

482

 

Form5: Information about rural area

34*24.5

(single side)

70

 

Form504-2: Box summery information(households in city)

A4

(single side)

85

 

Form504-4: Box summery information(households in countryside)

A4

(single side)

86

 

Form504-6: Box summery information(households in mapped rural area)

A4

(single side)

85

 

Form504-8: Box summery information(nonresident households)

A4

(single side)

72

 

Different ICR forms designed for Iran Census 2006

 You will find here a summery of total number of each field types in forms described above:

 


Char/numeric Fields


Check box Fields


KFI Fields


Total fields

1401

101

250

1752

Different kinds of fields in ICR forms of Census 2006

In each province of the country a central site is equipped for ICR processing. After the forms filled in by information of households allover the country, these forms collected and shipped to the central site of the province for processing.

 

HODA iReadDoc functionality in Census 2006

Data capture process started simultaneously in 30 provinces of the country. It took in average 39.56 days for each province to complete. 414 peoples took a part in the whole process in 2 working shifts.
Through the data entry phase, more than 36 million forms (75 million forms in A4 scale) scanned and processed by HODA iReadDoc software.

HODA iReadDoc performance in Census 2006 presented as below. It is noteworthy to know that the rejected characters which are sent for verifying step are characters with confidences less than or equal to 960 (character confidence is a numeric value between 0 and 1000). 


 

Total recognized characters

Rejected characters sent to verifiers

Modified charactes


Count

3,290,338,495

138,238,937

16,306,955


Percentage

 

4.2

0.49

Comparison of total recognized / rejected / modified characters

As figures talk, the total accuracy of system can be calculated as below:


100 -0.49 = 99.51%    


 
It is noteworthy to know that no check box sent to the verifiers thanks to high recognition rate of those fields by HODA iReadDoc.