# English confusion sets, mostly homophones
# Original source: After the Deadline's data/rules/homophonedb.txt
# URL: http://openatd.wordpress.com/ or https://openatd.svn.wordpress.org/atd-server/data/rules/homophonedb.txt
# License: GNU GENERAL PUBLIC LICENSE Version 2
#
# Line format:
# <word1>|<description1>; <word2>|<description2>; <factor>   # optional comment
#   <word1> and <word2> are words that can easily be confused
#   <description> will be used in the error message to explain the word (optional)
#   <factor> is the factor of how much more the other word must be more
#            probable so the text is considered potentially incorrect.
#            Use a higher value for better precision but lower recall.
#   Precision (p) and recall (r) values in the comments come from ConfusionRuleEvaluator
#     and are based on Wikipedia, Tatoeba, and Enron data, usually with 1000 random 
#     sentences (if that many sentence are available for that pair). The number after recall
#     is the number of sentences used for evaluation.
# Order is relevant for ambiguous cases like 'know' ('no' or 'now') where the match
# is used whose pair comes first in this file.

accept|to agree, to endure; except|with the exception of; 100000                  # p=0.997, r=0.792, 1900, 3grams, 2015-08-12
advice|noun; advise|verb; 100000                                                  # p=0.998, r=0.792, 1647, 3grams, 2015-08-12
affect|experience of emotion; effect|outcome, result; 10000000                    # p=0.982, r=0.595, 1805, 3grams, 2015-08-12
allowed|permitted; aloud|using the voice; 10000                                   # p=0.999, r=0.895, 1056, 3grams, 2015-08-12
allusion|passing reference; illusion|an erroneous mental representation; 1000     # p=0.994, r=0.796, 211, 3grams, 2015-08-12
and; end; 1000000000000                                                           # p=0.998, r=0.530, 1925, 3grams, 2015-08-12
ate|past of 'to eat'; eight|the number '8'; 100                                   # p=0.996, r=0.938, 1658, 3grams, 2015-08-12
Australia|the continent; Austria|country in Europe; 10000                         # p=0.995, r=0.279, 1977, 3grams, 2015-08-12
bean; been; 1000                                                                  # p=0.995, r=0.923, 1146, 3grams, 2015-08-12
be; bee|insects; 1000000                                                          # p=0.998, r=0.835, 1162, 3grams, 2015-08-12
been; bin|container; 100000                                                       # p=0.998, r=0.916, 1452, 3grams, 2015-08-12
begin|start; being|living thing; 100000000                                        # p=1.000, r=0.558, 1932, 3grams, 2015-08-12
be; we; 10000000                                                                  # p=1.000, r=0.869, 1889, 3grams, 2015-08-12
bellow|very loud utterance; below|beneath, at a lower place; 1000                 # p=0.998, r=0.920, 932, 3grams, 2015-08-14
bit; but; 10000000                                                                # p=0.996, r=0.799, 1950, 3grams, 2015-08-12
breathe|verb; breath|noun; 100000                                                 # p=1.000, r=0.768, 453, 3grams, 2015-08-12
but; butt; 100000                                                                 # p=0.993, r=0.836, 1145, 3grams, 2015-08-12
buy|purchase; bye|goodbye; 1000                                                   # p=1.000, r=0.832, 1041, 3grams, 2015-08-12
desert|arid land; dessert|last course of a meal; 1000000                          # p=0.996, r=0.491, 1036, 3grams, 2015-08-12
complain|verb'; complaint|noun; 100                                               # p=1.000, r=0.930, 473, 3grams, 2015-08-29
complains|3rd person of 'complain'; complaints|plural of noun 'complaint'; 1000   # p=1.000, r=0.859, 368, 3grams, 2015-08-29
do; to; 10000000000                                                               # p=0.998, r=0.561, 1977, 3grams, 2015-08-13
except|with the exception of; expect|regard something as likely; 10000000         # p=0.998, r=0.667, 1909, 3grams, 2015-08-12
expert; export; 10000                                                             # p=0.999, r=0.726, 1249, 3grams, 2015-08-29
extend|verb; extent|noun; 100                                                     # p=0.998, r=0.975, 1780, 3grams, 2015-08-12
fee; feel; 10000                                                                  # p=0.998, r=0.817, 1725, 3grams, 2015-08-13
fill; full; 1000000                                                               # p=0.997, r=0.770, 1798, 3grams, 2015-08-12
fir; for; 10000000                                                                # p=0.999, r=0.868, 998, 3grams, 2015-09-08
fir; fur; 100                                                                     # p=1.000, r=0.804, 398, 3grams, 2015-09-08
first; fist; 1000000000                                                           # p=0.994, r=0.800, 1076, 3grams, 2015-08-12
for|a preposition as in 'I have something for you'; four|the number '4'; 100000000# p=0.994, r=0.507, 1890, 3grams, 2015-08-12
had|past of 'have'; hat|head covering; 100000000                                  # p=0.998, r=0.647, 1757, 3grams, 2015-08-12
half|noun/adjective; halve|verb; 1000000                                          # p=0.997, r=0.961, 991, 3grams, 2015-08-12
he; the; 10000000                                                                 # p=1.000, r=0.703, 2000, 3grams, 2015-08-24
hear; heart; 1000                                                                 # p=0.996, r=0.954, 1971, 3grams, 2015-08-12
in; inn|a hostel; 1000000                                                         # p=0.999, r=0.829, 1145, 3grams, 2015-08-12
kind; king; 1000000                                                               # p=0.997, r=0.533, 1963, 3grams, 2015-08-12
known|past participle of 'know'; know|to be aware of; 1000                        # p=0.999, r=0.965, 1946, 3grams, 2015-08-12
know|to be aware of; no|negation; 100000                                          # p=0.999, r=0.887, 1903, 3grams, 2015-08-12
know|to be aware of; now|in this moment; 1000                                     # p=0.999, r=0.877, 1918, 3grams, 2015-08-12
lies; lines; 1000000                                                              # p=0.999, r=0.556, 1953, 3grams, 2015-08-12
loose|not tight; lose|opposite of win; 100000                                     # p=0.996, r=0.796, 1421, 3grams, 2015-08-12
mice|plural of 'mouse'; nice|pleasant; 1000                                       # p=0.997, r=0.862, 1215, 3grams, 2015-08-12
minuet|piece of music and dance; minute|60 seconds; 1000000000                    # p=0.995, r=0.568, 995, 3grams, 2015-08-12
minuets|pieces of music and dance; minutes|60 seconds; 1000000000                 # p=0.993, r=0.881, 1007, 3grams, 2015-08-12
near|close to; neat|clean or organized; 100000                                    # p=1.000, r=0.784, 1074, 3grams, 2015-08-12
new; news; 1000000                                                                # p=0.997, r=0.542, 1814, 3grams, 2015-08-12
nit; not; 100000000                                                               # p=0.998, r=0.945, 983, 3grams, 2015-08-12
not; note; 100000000                                                              # p=0.997, r=0.525, 1865, 3grams, 2015-08-12
now|at the present moment; no|negation; 1000000                                   # p=0.996, r=0.559, 1919, 3grams, 2015-08-12
our; out; 10000                                                                   # p=0.996, r=0.896, 1901, 3grams, 2015-08-12
passed|paste tense of 'to pass'; past|past times; 10000                           # p=0.999, r=0.858, 1938, 3grams, 2015-08-12
peace|absence of war; piece|a portion of a natural object; 1000000                # p=0.998, r=0.634, 1966, 3grams, 2015-08-12
peak|extremum, summit, top; peek|a secret look; 100000                            # p=0.999, r=0.797, 999, 3grams, 2015-08-12
pray|say a prayer; prey|animal hunted for food; 10000                             # p=0.994, r=0.794, 664, 3grams, 2015-08-12
proof|often noun, also 'proofread'; prove|verb: show, testify, demonstrate; 100000# p=0.997, r=0.850, 1583, 3grams, 2015-08-12
quiet|free of noise; quite|completely, to a degree; 100000                        # p=0.999, r=0.808, 1601, 3grams, 2015-08-12
quite|completely, to a degree; quit|stop, give up; 100000                         # p=0.998, r=0.837, 1519, 3grams, 2015-08-12
right|opposite of 'left'; rite|established ceremony or practice; 100000000        # p=0.995, r=0.698, 1175, 3grams, 2015-08-12
same|identical; sane|mentally healthy; 10000                                      # p=1.000, r=0.962, 1028, 3grams, 2015-08-12
sate; state; 100000000000                                                         # p=0.991, r=0.683, 1022, 3grams, 2015-08-24
scene|place where some action occurs; seen|past participle of 'see'; 1000         # p=1.000, r=0.920, 1979, 3grams, 2015-08-12
sea|ocean; see|to look at; 10000                                                  # p=0.996, r=0.869, 1947, 3grams, 2015-08-12
sire|noun: title, verb: make by reproduction; sure|certain; 100000                # p=0.997, r=0.936, 987, 3grams, 2015-08-12
than; that; 100000000                                                             # p=0.999, r=0.496, 1902, 3grams, 2015-08-12
than|used in comparisons; then|next in order; 100000                              # p=0.999, r=0.789, 1949, 3grams, 2015-08-12
their|as in 'It’s not their fault.'; there|as in 'Is there an answer?'; 1000      # p=0.994, r=0.966, 1939, 3grams, 2015-08-12
the; to; 10000000000                                                              # p=0.998, r=0.477, 2000, 3grams, 2015-09-19
thing|noun; think|usually a verb; 1000                                            # p=0.997, r=0.953, 1930, 3grams, 2015-08-12
things|plural noun; thinks|verb; 1000000                                          # p=0.997, r=0.813, 1725, 3grams, 2015-09-24
threw|past of throw; through|preposition, e.g. 'go through the door'; 100000000   # p=0.996, r=0.613, 1375, 3grams, 2015-08-12
vary|change, alter; very|to a great extent; 100000                                # p=0.998, r=0.873, 1972, 3grams, 2015-08-12
waist|body part between the ribs and hips; waste|worthless material; 1000         # p=0.996, r=0.860, 903, 3grams, 2015-08-12
was; way; 10000000                                                                # p=0.999, r=0.823, 1974, 3grams, 2015-08-18
wax; way; 10000000                                                                # p=0.995, r=0.812, 1138, 3grams, 2015-08-12
weather|atmospheric conditions; whether|as in 'whether or not'; 100000            # p=0.992, r=0.802, 1894, 3grams, 2015-08-12
we; wee|a short time; 1000                                                        # p=0.999, r=0.971, 987, 3grams, 2015-08-12
which; witch|female sorcerer or magician; 10000000                                # p=0.996, r=0.839, 1175, 3grams, 2015-08-12
while; whole; 100000                                                              # p=1.000, r=0.827, 1981, 3grams, 2015-08-25
widow; window; 100000                                                             # p=1.000, r=0.564, 1295, 3grams, 2015-08-24
with; wit|humor; 1000000                                                          # p=0.996, r=0.859, 1073, 3grams, 2015-08-12
your|possessive pronoun; you|personal pronoun; 10000                              # p=0.993, r=0.861, 1857, 3grams, 2015-08-12
