Introducing IndexMatic²
February 26, 2011 | IndexMatic² | en | fr
IndexMatic² for InDesign is an indexing tool for those of you who make books or long documents in InDesign. This new version is derived from two scripts I've worked on several years ago —formerly IndexBrutal and IndexMatic 1,— but the code has been entirely redesigned to offer new features and higher performance…
Beta test closed [2011-07-01] — The beta test period of IndexMatic² is over and the content of this page is partially obsolete. For now please refer to the product main page.
Disclaimer. — This post only provides an oversimplified presentation of IndexMatic 2. I will develop the advanced features and more technical aspects when the final release will be ready.
The most exciting thing about IndexMatic² is that it allows to produce an index by combining criteria which are related to the target document(s) —page range, sections, layers, styles— with a full regular expression search engine. The script thus offers both IndexMatic 1 and IndexBrutal features, and much more!, in a single tool.
Note. — The Context panel is locked in the BETA version, so you cannot index footnotes separately, and every anchored/embedded contents, including tables, is ignored.
Search Modes (Overview)
Once you have demarcated the target text area (Scope, Context, Styles), IndexMatic² offers three ways to extract keywords for indexing.
1) In the Search Mode panel, select “Automatic” to build a comprehensive list of words directly from the target document(s). You just have to specify a minimum and a maximum “length” (number of characters). The “Automatic” mode is the easiest to use but it tends to generate large corpus. You may limit the proliferation of irrelevant words by increasing the “Page Rank” value in the Default Options panel.
2) Select “Query List” to edit your own queries. Basically, a query is a simple keyword which is attached to an indexing term. By default any requested expression is attached to itself. IndexMatic² provides global options —Case sensitive, Whole Word, Generic Space— that allow to consolidate variants from the same expression. In addition you can base any query on a regular expression by enclosing the expression between slashes. For example, the query:
/a\w+/
matches any word that starts with the letter a (including the capital letter A in case insensitive context). You can use any valid JavaScript regular expression. When a query is based on a regular expression, IndexMatic² regards each resulting form as an indexing term. However, you can use a special operator (=>
) to link all forms to a single term. For example, the query:
/tests?/=>test
links the regexp /tests?/
to the term test
. Note that any query can use the =>
operator, even a simple query such as: dog=>animal
.
Note. — There are some subtle differences between JS regular expressions and GREP. For performance purposes, IndexMatic² does not invoke the InDesign GREP layer, the whole process is based on JavaScript RegExp queries.
3) Select “Single Query” to send a unique query rather than a query list. This option is useful to report any occurrence of a single expression, or to perform quick tests.
Output (Overview)
The Output and Page Report panels offer a number of features that control the final index. The whole purpose of IndexMatic² is to build a relevant set of terms, and to report each term with the corresponding page numbers and/or page ranges. The script does not alter in any way the target document(s) or book, and the BETA version only outputs the index as a “Text File”:
Try it and tell me!
• See also:
— Test of IndexMatic² in French (Urbanbike).
— The Beta is mentioned in InDesignSecrets: “This Week in InDesign Articles, Number 53”.
• Special thanks to: Laurent Tournier, Herbert M. Tucker, and Jean-Christophe Courte.
Comments
Excellent Marc, je n'ai pas de long document sous la main, mais je regarde dès que ça arrive !
Great work, Marc! I can't wait to have the time to try this! ;-)
J'ai fini il y a quelques jours l'index de mon livre sur Illustrator CS5 (presque 550 pages). Je vais le refaire avec ce script pour tester et comparer.
Comme à l'accoutumée, notre ami Marc nous offre un magnifique script. Comme il se doit, j'apprécie tout particulièrement la possibilité d'utiliser les expressions régulières. Un script qu'il faut prendre le temps de tester pour en maîtriser toutes les subtilités, que Marc saura nous expliquer avec tout son talent. Merci.
Zut, j'ai de la compta à faire… Dilemme, IndexMatic 2 ou Compta…??! T'es grouphhhhannnnt Marc avec tes scripts de la mort qui tue…!
Hi, When I try and run your script in CS3 I get the following error, "Invalid object for this request".
IndexBrutal opens OK but no luck with IndexMatic 2 which I just downloaded.
Thanks,
Herb
Thanks Herb, I was not very confident about the CS3-compatibility. I will investigate on this issue.
@+
Marc
Je n'ai pas laissé de message, mais j'ai eu le même avertissement hier, en français, sous CS4 Mac. Avec la version Beta2 + la spéciale. J'ai fermé et rouvert ID, et tout est rentré dans l'ordre.
Argh! Il y a donc un bug qui se cache quelque part... Je vais d'abord orienter mes recherches sur CS3, mais si d'aventure tu réussis à reproduire l'erreur de façon systématique, je serai évidemment très intéressé d'en savoir plus sur le document utilisé.
@+
Marc
Pour rassurer les utilisateurs d'IndexMatic2, mes problèmes proviennent d'une corruption de fichiers IDD associés à un Livre. Résidus de fichiers idlk.
This is great. Just what I need to create an index using a character style. I need it to work inside tables, as all my product numbers are inside tables. Any idea when the beta (or final product) that will allow this will be available?
Hi Peter,
I hope IndexMatic PRO be available by late April. Still a few checks.
@+
Marc
hi thanks so much this is almost a lifesaver. One question, how do I link sub words to a parent word in the query list so that they remain connected in the final index and not moved to alphabetical order?
For example:
main word:biome
sub words:deserts
forests
grasslands
savanna
semi-deserts
tundra
@scone8
[I'm sorry, your comment had been —wrongly— rejected as a spam. My blog engine sometimes acts like a dragon!]
> how do I link sub words to a parent word in the query
> list so that they remain connected in the final index
Good point! Indeed, you can only 'send' sub words to the main word, using sth like:
/deserts|forests|grasslands|savanna|tundra/ => biome
but IndexMatic does not actually manage subtopics in the way you want it to. I really take this feature request in consideration.
The track I'm exploring is to add a specific operator that would allow the user to specify topic hierarchy. What do you think of:
/deserts|forests|grasslands/ => biome > $0
?
@+
Marc
Marc,
I'm trying to run your script using a list. Some of the items in the list begin with a numeral, some of them have apostrophes. The script is ignoring those entries. I have 'Allow Hyphens' Digits & Apostrophes all checked, but it's just not working. Any clues on how I can index all the entries?
Great script, by the way. Incredible timesaver.
Hi Sherm,
Thanks for your feedback.
> Some of the items in the list begin with a numeral,
> some of them have apostrophes. The script is
> ignoring those entries.
You mean the script ignores *only* those queries? Could you show me an example?
If you try a single query such as 123abc, what happens?
About apostrophes, note that if the queries use single straight quotes (U+0027) and if the document uses RIGHT SINGLE QUOTATION MARK (U+2019), they won't match. Therefore, assumed you want to find any apostrophe form w/o distinction, you have to use a regex such as ['’].
E.g.:
/don['’]t/ => don’t
Hope I'm on the right track…
@+
Marc
Any word on PRO? Need table searching badly :)
Hi Kevin,
Still fixing bugs… And by the way, you've pinpointed the deepest issue: table searching. The search routine works fine but indexing the right page is a real headache. Indeed, the InDesign scripting DOM offers no access to the page that actually contains a table cell. So IndexMatic can only retrieve the page that owns the table insertion point, which produces wrong results when such a table spans over multiple pages.
Not sure I could find a workaround…
@+
Marc
Found a bug today.
Indesign returned the following error: reflect is read only
Hi Craig,
Thanks for your message. Other beta-testers reported the same issue and I already fixed that bug in my WIP version.
Note. — The bug occurs when the indexed document contains the word ‘reflect’ (!)
@+
Marc
Hi and thanks, amazing! But I have a question: I need find two words like one (name of city Nova Bystrice - found like Nova and Bystrice) - any idea how to connect together? Thanks again.
Hi Michal,
> I need find two words like one (name of city
> Nova Bystrice - found like Nova and Bystrice)
In your Query list you can connect multiple queries to the same final term:
/Nova/I => Nova Bystrice
/Bystrice/I => Nova Bystrice
(Note: The 'I' flag means that the query is locally case-sensitive whatever the Default Options.)
Better, you can wrap it all in a single regex:
/Nova|Bystrice/I => Nova Bystrice
Even better, with diacritics:
/(Nov(a|á))|(Byst(r|ř)ice)/I => Nová Bystřice
@+
Marc
This is great, but unfortunately, it does not work with Serbian cyrillic alphabet...
Hi tadir,
Yes, I'm really sorry about that. IndexMatic2 only supports Latin-based alphabets so far. I plan to fix this limitation in the future.
@+
Marc
The beta version had a "longest styled string" option. How do I do that in this new version?
Hi CJ,
> The beta version had a "longest styled string" option […]
Are you sure? I don't remember that. What is true is that the previous versions of the script managed break characters and new lines in a different way. Now IndexMatic does not regard breaks as if they could belong to the same entity. There are pros and cons on this approach, and this is a complicated topic, but after consulting the opinion of some beta-testers I had to make a choice. Please, tell me more on the issue you encounter.
> How do I do that in this new version?
Well, the generic way to capture the longest text ‘entity’ in a specific style is to use the query:
/.+/W
(but that does not cancel the restriction mentioned above.)
Feel free to supplement your comment. There are certainly still areas for improvement in IndexMatic.
@+
Marc
Hi Marc,
I'm using the Pro version in CS5 and my query from a simple (I think) word list continues to return a "syntax error" from within InDesign. I have "dumbed down" my word list to include only single, whole words but the error persists. I'm totally new to scripting and using your scripts. I'd be happy to provide any additional information that you need. Thanks in advance.
Hi Joel,
Thanks for your report. I will contact you by email in order to study this bug.
@+,
Marc
Any chance I could get the beta version back? I have purchased the Pro, but I desperately need the other one. The easiest way for me to describe the whole longest styled string thing is for you to go to http://indesignsecrets.com/building... and see the screen shot. Under the "Search For" drop down, there was a "longest styled string" and I used it for indexing directories. Instead of indexing "clinic" and "name" it would index "Clinic Name" and it was fabulous. And I loved it. And my computer crashed, and I lost it and bought the new one. And it's sort of a design/indexing emergency.
Nevermind. I found the archived file. THANK YOU!!
Hi CJ,
That's OK, but as replied above (#26) you can retrieve the “longest styled string” with IndexMatic 2. Simply use the following query:
/.+/W
Regards,
Marc
==================================
[EN] THIS COMMENT THREAD IS CLOSED.
FOR FURTHER DISCUSSION ABOUT INDEXMATIC 2,
PLEASE POST YOUR MESSAGE IN THE
“FREQUENTLY ASKED QUESTION” PAGE.
==================================
[FR] CE FIL DE COMMENTAIRES EST FERMÉ.
POUR POURSUIVRE LA DISCUSSION SUR INDEXMATIC 2,
MERCI DE POSTER VOTRE MESSAGE DANS
LA « FOIRE AUX QUESTIONS ».
==================================