|
|
|
 |
How to Automatically Insert Hyperlinks in documents
for publication to Intranets
|
Article contributed by John McGhie
This article explains how to automatically insert hyperlinks in documents
for publication to Intranets, using Word 2000 and above.
There are two methods you can use, depending on the nature of your document:
|
The AutoText
process. |
|
The Concordance
process. |
Use the process suitable to your source material.
|
Neither method
suits documents that will contain less than a few
hundred hyperlinks: the overhead in setting them up is excessive. |
|
The Concordance
method requires that you know how to edit
macros. The editing is very simple: read these instructions to decide if
you can do this. |
|
The Concordance
method is better suited to reference manuals,
programming manuals and the like, which contain sections or appendices of “fields or parameters or commands that are frequently referred
to
throughout the text. |
|
The Concordance
method suits material where most hyperlinks
occur more than 20 times. |
|
The Concordance
method is better suited to situations where the
hyperlink destinations will vary from issue to issue, since you can readily
change the hyperlink path for all destinations in a single place. |
|
Corporate
documentation, manuals, and lengthy reports are better
suited to the AutoText method. |
|
The AutoText
method is the only one that will work if the
document has already been divided into web pages. You can use it by
employing Word to edit the HTML files directly. |
Preparation
Read all the instructions before you begin: everything depends on everything
else! Both methods work better if the source is a single file.
1. |
You need to know the destination
URLs and hyperlink path for the
eventual website. I strongly suggest that you name each HTML page after
the
heading that begins it: Word will do this for you automatically. |
2. |
You need to know the directory
structure of the destination web.
This example is written around a real-life example: a book named Open
Interface Specification is published to a sub-web named OI_Spec/Issue_9
which contains two sub-directories: Field_Pages and Text.
OI_Spec
Issue_9
Field_Pages
Text
I haven’t shown the web root. The manual divides into three pools
of
pages. The main page and all the various TOC files for the navigation
structure are in the sub-web root. The narrative pages are in the Text
subdirectory, and the Field pages are in the Field_Pages folder.
In Field_Pages there is one page for each of the fields. Each of the Text
pages contains a hyperlink to one or more of the fields. In total the
manual contains around 4,500 hyperlinks and about 800 text pages.
So we can deduce that the hyperlink path for each of the field pages is
http://Documentation/OI_Spec/Issue_9/Field_Pages. And for the text it’s http://Documentation/OI_Spec/Issue_9/Text.
|
3. |
You must know exactly what each
page name is, which means you
should have already published the manual to the website.
If you want to use the Concordance method, use an automated method of
publication so that the page names are machine-generated, because you will
need to re-publish the entire document when the hyperlinks are in place. If
the page names change you get 4,500 broken links {grin}.
Publish now and come back here when you have finished.
|
4. |
It’s important to fix your fields
so their values do not change
during the write to HTML, and particularly, the values of any numbering that
may be in use. The easiest way to do that is to save the document as a web
page at this point, then use the Export to Compact HTML function.
|
5. |
In most cases you would now save
the document in its final resting
place and set the Hyperlink Base property in
File>Properties>Summary.
This causes any hyperlinks Word writes after this to be inserted as relative
paths, which makes moving and maintaining the document easier afterwards.
|
AutoText Method
The AutoText method is the simplest to understand. This makes it the
method
of choice for shortish documents.
You scan through the document (by eye) looking for things you want to
hyperlink. When you find one:
1. |
Insert a hyperlink in the ordinary
way. Use Ctrl + K and either
browse to the page or select it from the list. It’s a very good idea to
select only the name of the object. Instead of See manual insertion
later
in this document you would highlight only manual insertion. |
2. |
Select the entire hyperlink you
just inserted. |
3. |
Press Alt + F3 to open the Add
AutoText box. |
4. |
Leave the name exactly as it
appears on the text and click OK. |
Now you continue scanning through the document:
|
Each time you
come to an instance of a hyperlink item you have
already inserted, press F3 to insert the hyperlink again. |
|
Each time you
come to an item you have not inserted before,
perform steps 1 to 4 to add it to your AutoText collection. |
|
If you’re not
sure, press F3 anyway. If you have not already
added this hyperlink, nothing will happen. |
This method relies on the fact that in Word 97 and above, Word stores
AutoText entries with complete formatting and all properties. When you hit
F3, Word replaces the document text with the text of the hyperlink, complete
with the hyperlink field properties.
Concordance Method
This is a complex method suitable for applying a very large number of
hyperlinks to a large document.
For this method to work properly, the terms to be hyperlinked must be either
unique within the document or very rare. The method does not produce a
useful result if you include terms that occur throughout the document but
which should not be hyperlinked.
How it Works
This method uses a concordance file to add an index tag to every instance of
a term you want to hyperlink.
It then uses a macro to retrieve the text from each index tag, highlight the
text before it, add a hyperlink tag with the same text, then delete the
index tag.
For the experts, it may be worth noting that:
|
You cannot
automatically tag a Word document with any sort of
field tag other than an XE tag. |
|
An XE field is one of the Cold
field types: in VBA terms this means it does not have a result which
means two things: |
|
You cannot
directly access the result of the field, you have to
grab the text from it. |
|
You cannot change
the field type after it’s inserted, you have
to laboriously store the field text then write a new field of type hyperlink
with the text as the URL. |
|
Preparation
To use the concordance method, the document must be a single file, and the
destination file names must be known in advance.
1. |
Use the Master Document method to
publish the document to a
website. Ensure that you name the pages after the strings you expect to
find in the text. If you use the master document method, word will do this
automatically. |
2. |
Install the macros below to your
Normal template. |
3. |
Construct a concordance file of the
terms to be tagged and the
hyperlink destinations. |
Macros
Install the following macros to your Normal Template.
Sub MakeConcordance()'
Const hBase As String = "../Text/"
Const htm As String = ".htm"
Dim aCell As Cell
Dim aString As String
For Each aCell In ActiveDocument.Tables(1).Columns(2).Cells
aString = hBase & Trim(Left$(aCell.Range.Text, (Len(aCell.Range.Text)
-
2))) & htm
aCell.Range.Text = aString
Next aCell
End Sub
Sub MakeHyperlinks()
Dim afield As Field
Dim url As String
Dim isHyper As Integer
For Each afield In ActiveDocument.Fields
If afield.Type = wdFieldIndexEntry
Then
isHyper = 0
afield.Select
Selection.Collapse
url = Right$(afield.Code,
Len(afield.Code) - 5)
url = Left$(url, Len(url) - 2)
If Left$(url, 4) = "../F"
Then
isHyper = 1
End If
If Left$(url, 4) = "../T"
Then
isHyper = 2
End If
If isHyper <> 0
Then
Selection.MoveStart unit:=wdCharacter, Count:=-3
Selection.MoveStart unit:=wdWord, Count:=-isHyper
afield.Delete
ActiveDocument.Hyperlinks.Add Anchor:=Selection.Range, _
Address:=url
End If
End If
Next afield
End Sub
Concordance File
A concordance file is simply a document that contains nothing other than a
two-column table.
In the first column, you place each term to be searched for. You must have
only one entry per cell.
In the same row in the second column, you place the text of the index entry
you want created.
The macros above are designed to operate with specific character strings in
the concordance file. Here is a section of the concordance file they are
designed to work with:
transactionOriginID |
../Field_Pages/transactionOriginID.htm |
TSNumber |
../Field_Pages/TSNumber.htm |
undisclosedQuantity |
../Field_Pages/undisclosedQuantity.htm |
userID |
../Field_Pages/userID.htm |
userName |
../Field_Pages/userName.htm |
userType |
../Field_Pages/userType.htm |
yield |
../Field_Pages/yield.htm |
100 enterOrder |
../Text/100 enterOrder.htm |
101 amendOrder |
../Text/101 amendOrder.htm |
102 tickOrder |
../Text/102 tickOrder.htm |
103 cancelOrder |
../Text/103 cancelOrder.htm |
104 enterTrade |
../Text/104 enterTrade.htm |
105 cancelTrade |
../Text/105 cancelTrade.htm |
106 setLiability |
../Text/106 setLiability.htm |
In the left column are the strings that appear in the text of the manual.
Note that there is a potential problem with the word yield. This is
a
common-enough English word to give problems. In the manual I use this
technique for, I have established that no problem exists.
Notice that I called these entries strings. The Index generator,
which
does the first half of the work, performs a character-for-character match on
these strings. Each character must be exactly correct, but the case of the
characters does not matter.
In the right-hand column is the tag we want to insert each time an entry in
the left column is found. Note that we are going to repurpose an
index
entry as a hyperlink, so you need to be aware that the strings in the
right-hand column are not going to end up as index entries, which is why
they do not follow the format of an index entry.
The entry is in two parts: the path and the page name.
For example ../Field_Pages/userType.htm
../Field_Pages/ is the path
userType.htm is the page name.
The entire string forms the relative path from the document to the
destination Field Pages folder.
You can have as many paths as you wish, provided you add a section in the
macro for each one. In this case there are two: Field_Pages and Text. The
macro makes its selection based on the first four characters in the string:
either ../F or ../T in this case. In my case, the book also
contains
legitimate index tags. Since this macro deletes the tags it processes, it
is important that these tags are never the same as a legitimate index tag.
I retained the two dots of the relative path because no legitimate index
tags begin with two dots!
Go ahead and build your concordance file now. An easy way to obtain the
page names is to perform a directory listing of the folder on the web server
where they are. Go to a command window:
1. |
Drill down to the folder that
stores your field pages or your
text. |
2. |
Run the command: Dir /n
*.htm > list.txt
This leaves a text file called list.txt sitting in the directory
containing all the page names. The /n parameter places all the file
names
on the extreme right where they are easy to get at. |
3. |
Open the list.txt file in Word. |
4. |
Hold down the Alt key and drag
diagonally down to select a
vertical block of text up to but not including the file name. |
5. |
Press Delete to get rid of the
material you do not want. |
6. |
Save the file as a Word document |
7. |
Select Table>Convert>Text to
Table to convert the list to a
single-column table. |
8. |
Copy the resulting column and paste
it beside the existing one to
produce two columns. |
9. |
Edit the Const hBase As String =
"../Text/" line of the
MakeConcordance macro to specify the string you want to use for your path.
Ensure the fourth character is unique among your chosen paths. |
10. |
Run the MakeConcordance macro. This will run
through and add the
path to the front of the file names in the second column. |
11. |
To process additional subdirectories of your
website, repeat steps 1
through 10 for each directory. |
12. |
Paste the separate concordance tables into a single
table when you
have finished, and save the whole thing where you can find it again. |
The reason I told you to name your pages after their names in the text of
the manual was so that you would not have to edit the first column of the
table. If you could not do that, you must now go through and place the
actual string you expect to find in the text in the left column against each
entry.
Tagging the Document
Make a copy of your original document. You will need a few practice runs
to
get this right.
Follow the instructions in the Word help to tag the document automatically
with index tags.
See the two help topics Create a concordance file and Automatically
mark
index entries by using a concordance file.
Now eyeball the document to see what happened. You may need to click the
Show/Hide toolbar button to reveal the Index tags. Do a quick scan to
ensure that you got the correct tags on the correct entries, and that not
too many undeserving pieces of text were awarded tags.
Refine your concordance file and repeat this process until you are satisfied
that the correct material is being tagged.
Don’t expect perfection: you will get a few misses and a few spurious hits.
Live with it. You can take out the extras later: they’re easier to see
when
they become hyperlinks.
Create the Hyperlinks
Now edit the MakeHyperlinks macro to work with your chosen paths.
The construct
If Left$(url, 4) = "../F" Then
isHyper = 1
Does two things: It selects an action based on the fourth character of
your
Path name, and sets the isHyper variable to the number of words to go
back.
After the macro deletes the index tag, it needs to extend the selection
backwards to select the term in the text before creating the hyperlink. In
my case, the page names in the Field_Pages folder are all single words, so
it needs to go back only one word, while all the names in the Text folder
are two words, so isHyper becomes 2.
Run the macro on your tagged copy of the document and have a look to see
what gets highlighted.
Adjust accordingly and have another try.
Republish the Document
Now re-publish your document to your website, using exactly the same method
as you used last time so that the page names are not changed.
Voila! Thousands of hyperlinks, automatically inserted, to cross-reference
everything in your web with everything else.
Future Issues
You now choose whether to make the tagged document your official document or
not.
|
If you decide to
make it your official document, with the
hyperlinks saved in the official source, you will never have to perform this
procedure again. |
|
However, if the
website moves, or as text gets added, changed,
or deleted, you will have quite a maintenance effort to keep up with all the
hyperlinks. |
I prefer to hang onto the concordance file and simply re-run this process
for each new issue. It gives you greater flexibility and lower
maintenance.
Stripping the HTML
Our Intraweb is also accessible by people working from home on a spluttering
dial-up line, so we like to get the pages lean, mean and hungry.
Having published the manual, we save it as a Cascading Style Sheet, then use
FrontPage to attach the CSS to every page in the sub web.
We then use HTML Filter 2 from the Microsoft
web site, to strip the style sheet Word stores in each web page and a lot of the XML
formatting that we do not need. This reduces the size of each page by
about
70 per cent.
Here is the batch script we use:
F:
cd "F:\OI_spec\Issue_91"
FOR /R %%i IN (*.xml) DO del "%%i"
FOR /R %%i IN (*.mso) DO del "%%i"
FOR /R %%i IN (*.htm) DO filter -abflmstv "%%i"
The subweb is, of course, in a folder I map to the F drive.
I am afraid I have not yet figured out how to do this in Word 2002. HTML
Filter 2 is not available for Word 2002: you have to use VBA to manipulate
the output filter, and I don’t think the same abilities are available.
|