1. Style-Driven Index

Whenever possible, build your index from an InDesign character style. That's in all cases the optimal solution. If by any means you can apply a dedicated style to every indexable expression, then you keep a total control and the problem of removing undesired homonyms no longer arises: tell IndexMatic to focus on the target style and run your query (or query list.) End of the story!

Best way to discriminate homonyms in IndexMatic: target a character style.

Note. — Use the greedy regex /.+/ to index all styled matches in one step, or /\w+/ to extract individual words within the target style.

2. Manual Cleaning

Most InDesign documents are created without anticipating the index. Then it's too late to mark up data, and existing styles won't help. First question is, how many instances of the kind of “Washington” (person ≠ place) does your document contain?

Specialized query lists and/or “Hits” report help identify the number of conflicting terms.

A small number of parasitic cases can be dusted off manually. Query lists combined with Hits report are IndexMatic's best utilities for handling those problematic terms. If the stats reveal that your document has few occurrences of homonymic forms, well, you may sometimes be fine with adjusting the final index by hand.

3. Refined Query List

Manual treatment quickly becomes exhausting if you have to generate the index again and again from a changing document, while a significant number of homonyms comes into play. Then an option is to “refine” your query list. Suppose your goal is to extract proper names like

Hepburn => $0, Katharine
Davis => $0, Bette
Hepburn => $0, Audrey
Bergman => $0, Ingrid
Garbo => $0, Greta
Monroe => $0, Marilyn
Taylor => $0, Elizabeth
. . .
 

(More on “last name first name” queries here.)

Your queries are almost all fine but the keys Bergman and Taylor have ambiguous matches in some chapters that refer to “Ingmar Bergman” (≠ Ingrid) and “Taylor Swift” (≠ Elizabeth).

The idea is to specifically sharpen the queries that grab too many matches. By examining the document you observe that the genuine occurrences of “Bergman” and “Taylor” are always accompanied by the expected first name. Then the two associated queries could be rewritten as follows:

. . .
/Ingrid Bergman/s => Bergman, Ingrid
/(E\.|Liz|Elizabeth) Taylor/s => Taylor, Elizabeth
. . .
 

This solves the problem. Note that the second query recognizes both “Liz Taylor”, “Elizabeth Taylor”, and even the form “E. Taylor”. (Unfortunately we cannot apply that trick to “Ingmar” vs. “Ingrid” Bergman since the first names have the same initial letter.)

Another advantage of fine-tuned queries is they allow to treat and maintain conflicting names as distinct index entries:

 
/(\m\w+) Taylor/s => Taylor ($1)
 

will typically generate entries like Taylor (Elizabeth), Taylor (Christine), Taylor (Don)

4. Fake Indexing Style Group

If none of the above strategies fits your needs, a semi-automatic solution is still possible. The idea is to create a fake style group for indexing purposes, then to manually exclude the foreign matches from that group. This method is therefore recommended when extra matches that result from homonymy must be ignored.

Here are the main preparatory steps in InDesign:

(A) Create a character style NoneIndexable based on [None] with no additional attribute (so it has no visible effect at all.)

(B) Create a character style Group INDEXABLE and put NoneIndexable into it.

(C) If your document has some character styles—and it surely has!—, move into the INDEXABLE group every style applied to any text that may contain indexable expressions.

(D) From the Find/Change dialog, set Find Format to Character Style: [None] and Change Format to Character Style: NoneIndexable (INDEXABLE) and click Change All. Thus, even non-styled parts of the document now belong to INDEXABLE.

Making all indexable text available in a dedicated group.

(E) Identify the foreign homonyms throughout the document, e.g “Washington” (as a person), “Ingmar Bergman”, etc. Select each match and apply the [None] style instead of NoneIndexable (assuming no special formatting is involved here). If some style in INDEXABLE is applied to the undesired expression, duplicate the style and move the copy outside of the group.

At the end of the process, every foreign match will receive a character style outside of the INDEXABLE group. This might be a bit tedious but still easier than refining queries, assuming the document contains a reasonable amount of homonyms.

Finally, run IndexMatic and use your original query (or query list), making sure you also select the prepared group, denoted [INDEXABLE]*, in the Style panel:

Select the INDEXABLE group in IndexMatic's Style panel.


In more specific cases, filtering paragraph styles, or layers, or page range, may be helpful too. Each project has its particular constraints, settings and formattings, so it is often the case that IndexMatic can facilitate the process, even if it cannot replace human indexing in all respects.