Heading One – click here to add title

January 21, 2018 | Author: Anonymous | Category: Math, Statistics And Probability, Statistics
Share Embed Donate


Short Description

Download Heading One – click here to add title...

Description

Linking, selecting cut-offs, and examining quality in the Integrated Data Infrastructure (IDI) Laura O’Sullivan Statistics New Zealand laura.o’[email protected]

IAOS Vietnam October 2014

Outline The Integrated Data Infrastructure (IDI) Terminology IDI linking • • • •

Near-exact and non-exact Selecting cut-offs Quality Clerical review

Linking at Statistics New Zealand and at the Australian Bureau of Statistics 2

Integrated Data Infrastructure (IDI)

Student loans & allowances

Migration & movements

Education Benefits

Business data

Person-centred data Tax

Justice Health & safety

Families & households

33

Terminology Data integration (aka Record linkage) Deterministic linking Probabilistic linking (Fellegi-Sunter theory) Weights Represent the probability that two records are from the same person

4

Cut-offs Distribution of the weights Non-links

1240 Number of record pairs 1040

840

640

Links

440

240

40 -95

-75

-50

-25

0

25

50

Source: Statistics New Zealand

5

Quality True matches

Non matches

Linked

True positives

False positives

Unlinked

False negatives

True negatives

6

Near-exact and non-exact First name and Last name agreement Data Insert

Delete Replace Double Single

A

Robert Robert Robert

Robert

B

Robiert Robrt

Roobert Robert

Rovert

Swap

Append Truncate

Robbert Robert Kat Robret Katie

Katie Kat

Date of birth agreement Data

Replace

Swap

Transpose

A

04/08/1982

02/08/1982

02/08/1982

B

04/02/1982

20/08/1982

08/02/1982 7

Selecting the cut-off Graph of near-exact and non-exact links Frequency of links 300,000 Non-exact

Near-exact

250,000

200,000

150,000

100,000

50,000

0 0

2

4

6

8

10

12

14

16

18

20

22

24

26

28

30

32

34

36

38

40

Source: Statistics New Zealand

8

Quality in the IDI False positive rates • Sample from non-exact links • Assume near-exact links are true matches • Use proportional sampling

Non-exact rates • Monitoring

9

Clerical review A link with two first names matching and different last name Dataset

First names

Last names

Date of birth

Sex

A B

Mary Louise Mary Lou

Brown Hughes

04/11/1984 04/11/1984

2 2

A link with unique identifiers and missing name information in one dataset Dataset A B

Identifier 12345 12345

First names Owen -

Last names Keyes -

Date of birth 06/01/1951 06/01/1951

Sex 1 1

A link with missing name information and without unique identifiers Dataset A B

First names Holly Jessica Holly

Last names Gordon

Date of birth 01/05/1940 01/05/1940

Sex 2 2

10

Statistics New Zealand and the Australian Bureau of Statistics Statistics New Zealand Census to the Post-enumeration survey (PES) Linking the longitudinal census

Australian Bureau of Statistics Linking projects using name and address Census data enhancement project

11

Thank you for listening Questions

laura.o’[email protected] 12

View more...

Comments

Copyright � 2017 NANOPDF Inc.
SUPPORT NANOPDF