Technote 10005: Introduction to cleaning HTML code
Using the HyperText Studio's powerful clean feature you can create your
own cleans when you want to remove or modify sets of tags and attributes in
a document. You can also modify the default clean sets. This technote
takes you through the process of creating a custom clean and introduces you
to many of the concepts used in HTML cleaning.
Note: This technote assumes some knowledge of
Opening the associated file
- Download the file used in this tutorial by clicking
here. Unzip the file and extract faq.aspx. This file is
part of the HyperText Studio tutorial and was originally produced in Microsoft
Word but has already been cleaned by the HyperText Studio's Microsoft Word
- Select File | Open and open faq.aspx.
- Go to Source view.
Creating a Custom Clean Set
A clean set contains the rules used to clean a document.
- Select Tools | Reformat Code.
- Select Clean Code.
- Click Browse.
New. The currently saved list of settings displays.
- Type myclean as the name
of the Clean Set.
The Clean Set dialog box opens.
- Leave the dialog box open.
Creating a New Match
A clean rule is made up of a match, which finds a tag to work with and an
action, which does something with the tag that has been matched.
In our case, some of the style sheet classes that were created while cleaning
the Microsoft Word 2000 code are redundant, so you are going to remove them.
You will create four matches, with their corresponding actions, to remove all
span tags that have the class attribute set to "class5", "class6",
or "class7", and to remove the "MsoNormal" class from p
- Click New Match. The Clean Rule Match dialog box opens.
- Fill in the boxes as shown below.
- Click OK. Leave the Clean Settings dialog box open.
Creating a New Action
Now that a match has been created, you need to configure the match to do
something - this is called an action.
- Click New Action.
- Select Remove Element from the Action Type drop-down list.
- Click OK to close the Clean Rule Action dialog box. Leave
the Clean Settings dialog box open.
Adding New Matches and Actions
- Repeat the steps in Creating a New Match and Creating a New
Action to create rules removing span tags that have the class attribute
set to class6 and class7.
- To remove the MsoNormal
class from the p tag, create a match following the same steps, using p
instead of span. When creating
the action, select Remove Attribute.
- Your dialog should match the one below:
- Leave the dialog box open.
Setting Clean Options
You can modify the clean options so that when the references to the classes
are removed from the HTML, the corresponding CSS styles are removed as well.
- Select the Options tab.
- If necessary, select Remove Unused Selectors.
- Click OK to close the Clean Settings dialog box.
- Click OK to close the Clean Settings list.
- Select myclean from the drop-down list, leaving Reformat Code
- Click OK to execute myclean.
Your code should now look a lot cleaner.