Visual Basic Algorithms
Pure VB: Soundex Overview
     
Posted:   Saturday January 19, 2002
Updated:   Monday December 26, 2011
     
Applies to:   VB3, VB4-16, VB4-32, VB5, VB6
Developed with:   VB6, Windows XP
OS restrictions:   None
Author:   VBnet - Randy Birch, assorted web links (cited)
     

Related:  

 
     
 Prerequisites
None.

Reference: Brad & Kathy Genealogy (http://www.bradandkathy.com/cgi-bin/yasc.cgi)

First applied to the US Census in 1880, a Soundex-coded surname is an alpha-numeric indexed based on the way a surname sounds, rather than the way it is spelled. The intent was to help researchers find a surname quickly even though it may have received different spellings.  In generating a Soundex index for a name one follows these basic rules:

  • Every Soundex code consists of a letter and three numbers, such as B-536 (which also happens to represent names like 'Bender').

  • The letter is always the first letter of the surname, whether it is a vowel or a consonant.  Except for the surname's first letter, all remaining vowels (A, E, I, O, and U) as well as the consonants W, Y, and H are disregarded in forming the Soundex code.

  •  
  • The next three consonants of the surname are assigned values from the lookup table below (and note the exceptions).
  • Any remaining consonants in the name are ignored (in other words, a maximum of three are used).  If there are not three consonants following the initial letter, zeroes complete the three-digit code. A name comprised of only vowels after the first letter (such as 'Lee') yield no code number, and would thus be represented as L-000. A name with only one additional consonant, such as COOK, would be C-200. Although most surnames can be coded using the guide in Table 1 there are always exceptions and special considerations, all of which are outlined below.

Surnames that sound the same but are spelled differently (like SMITH and SMYTH) will have the same code. And because vowels are ignored in all but the first character, even SMYTHE will have that same code. The same goes for BIRCH / BURCH.  But the surnames BIRTCH or BURTCH, with the inclusion of a T within the first three consonants would generate a different index.

In a similar vein, names that sound alike but differ in spelling, and have a different first letter such as COOK / KOCH, FAUST / PHAUST etc. will result in different Soundex codes for each spelling. Therefore, any comprehensive search would require searching for all possible Soundex codes, based on the variation of the surname's first letter.

 
Name  

Soundex   

BENDER   B536
BIRCH   B620
BURCH   B620
BIRTCH   B632
BURTCH      B632
COOK   C200
KOCH   K200
FAUST   F230
PHAUST   P230
SMITH   S530
SMYTH   S530
SMYTHE   S530
 

TABLE 1: SOUNDEX CODING GUIDE
After retaining the first letter of the surname and disregarding any following letters if they are A, E, I, O, U, W, Y, or H:

       
  The number   Represents the letters
  1   B, P, F, V
  2   C, S, K, G, J, Q, X, Z
  3   D, T
  4   L
  5   M, N
  6   R

Prefixes
If the surname has a prefix, such as D', De, dela, Di, du, Le, van, or Von, code it both with and without the prefix because it might be listed under either code. The surname vanDevanter, for example, could be V-531 or D-153. Mc and Mac are not considered to be prefixes and should be coded like other surnames.

Double Letters
If the surname has any double letters, they should be treated as one letter. Thus, in the surname Lloyd, the second l should be crossed out. In the surname Gutierrez, the second r should be disregarded.

Side-by-Side Letters
A surname may have different side-by-side letters that receive the same number on the Soundex coding guide. For example, the c, k, and s in Jackson all receive a number 2 code. These letters with the same code should be treated as only one letter. In the name Jackson, the k and s should be disregarded. This rule also applies to the first letter of a surname, even though it is not coded. For example, Pf in Pfister would receive a number 1 code for both the P and f. Thus in this name the letter f should be crossed out, and the code is P-236.

American Indian and Asian Names
A phonetically spelled American Indian or Asian name was sometimes coded as if it were one continuous name. If a distinguishable surname was given, the name may have been coded in the normal manner. For example, Dances with Wolves might have been coded as Dances (D-522) or as Wolves (W-412).  The the name Shinka-Wa-Sa may have been coded as Shinka (S-520) or Sa (S-000).

If Soundex cards do not yield expected results, researchers should consider other surname spellings or variations on coding names.

Female Religious Figures
Nuns or other female religious figures with names such as Sister Veronica may have been members of households or heads of households or institutions where a child or children age 10 or under resided. Because many of these religious figures do not use a surname, the Soundexes for the post-1880 censuses frequently use the code S-236, for Sister, whether or not a surname exists. So far as can be determined, though, the Soundex for the 1880 census does not use the code S-236 for this purpose.

Because of the limitations of the 1880 Soundex, the number of cards mentioning a nun or comparable person is likely to be very small. If this person was the head of a household or institution with children, indexers may have coded the head's surname. If no surname existed, the indexers may have used the Not Reported (NR) surname option discussed later. In either case, if the household or institution headed by a female religious figure included a child under 10, the researcher also can code the child's surname and seek an Individual Card. No Individual Card, though, applies to a nun or any other person 10 years or older.

Single-Term Names
In 1880 many individuals, especially in Alaska or areas with many Native Americans, may have used only a single-term name such as Loksi or Hiawatha. Perhaps not until the 1900s did their descendants use a surname. Some researchers, therefore, may need to code a single-term name as though it was a surname. If this rule applies to the head of a family and other family members have different names, Individual Cards will also pertain to those members age 10 or younger.

H and W Rule
The letters H and W do not act as separators between letters having the same code value. As a result, such letters are treated as adjacent and are condensed into a single code. For example, the letter sequence "CHS" would be coded as 2, whereas without this rule, it would be coded as 22. Note that this rule has often been omitted in descriptions of Soundex.


TABLE 2: EXAMPLES OF SOUNDEX CODING
After retaining the first letter of the surname and disregarding the next letters if they are A, E, I, O, U, W, Y, and H, then:
           
  Name   Coded   Soundex Code
  Allricht   l, r, c   A-462
  Eberhard   b, r, r   E-166
  Engebrethson   n, g, b   E-521
  Heimbach   m, b, c   H-512
  Hanselmann   n, s, l   H-524
  Henzelmann   n, z, l   H-524
  Hildebrand   l, d, b   H-431
  Kavanagh   v, n, g   K-152
  Lind, Van   n, d   L-530
  Lukaschowsky   k, s,   L-222
  McDonnell   c, d, n   M-235
  McGee   c   M-200
  O'Brien   b, r, n   O-165
  Opnian   p, n, n   O-155
  Oppenheimer   p, n, m   O-155
  Riedemanas   d, m, n   R-355
  Zita   t   Z-300
  Zitzmeinn   t, z, m   Z-325

 
 

PayPal Link
Make payments with PayPal - it's fast, free and secure!

 
 
 
 

Copyright ©1996-2011 VBnet and Randy Birch. All Rights Reserved.
Terms of Use  |  Your Privacy

 

Hit Counter