Algorithms For Dummies

Dealing with data duplication

Download 7,18 Mb.

1 ... 220 221 222 223 224 225 226 227 ... 651

Bog'liq
Algorithms

Dealing with data duplication

Duplicated data occurs for a number of reasons. Some of them are obvious. A user

could enter the same data more than once. Distractions cause people to lose their

place in a list or sometimes two users enter the same record. Some of the sources

are less obvious. Combining two or more datasets could create multiple records

when the data appears in more than one location. You could also create data dupli-

cations when using various data-shaping techniques to create new data from

existing data sources. Fortunately, packages such as Pandas let you remove dupli-

cate data, as shown in the following example. (You can find this code in the

A4D;

06; Remediation.ipynb

file on the Dummies site as part of the downloadable

code; see the Introduction for details.)

df = pd.DataFrame({'A': [0,0,0,0,0,1,0],

'B': [0,2,3,5,0,2,0],

'C': [0,3,4,1,0,2,0]})

CHAPTER 6

Download 7,18 Mb.

Do'stlaringiz bilan baham:

1 ... 220 221 222 223 224 225 226 227 ... 651