Have you ever created a web page with a lot of detailed information and wanted an easy way for your readers to parse and filter page entries based on user-specified keywords? This article explains how to use an HTML form and JavaServer Pages (JSP) technology to do exactly that. And even if you have never wanted to do this, you might find the example demonstration and code walkthrough useful because they cover how to retrieve request values and compare them to values in lines read from a static file.
The HTML form is placed on the page you want to search. It provides a list of selectable keywords — words you know are on the page, that do not match your topic headings, and that you believe your users might want to search on. It also provides an input field for your users to type in a keyword in case they cannot find what they want to search on in the list.
When the user presses the Return key or clicks the Go button, a JSP is called that parses an HTML page one line at a time, looks for the user-specified keywords, and returns a page that lists all lines on the HTML page that contain the keywords. The returned results are organized under the keyword(s) where there is a match.
The figure below shows the HTML form on the left, and the HTML code to create the form on the right. The form is live. Go ahead and select or enter keywords and click the Go button.
When you click the Go button, a JSP page is called that parses a copy of a page and returns a list of articles where any part of the entry contains the specified keyword(s). For example, if you remember there was an article by somebody named Steve that you really liked, type in "Steve" and click the Go button to see a list of Steve's articles.
Note: This might take a few seconds to complete because the search page, described below, has a lot of HTML code for the banner, footer, and left navigation that gets read.
How the Form Looks | HTML to Make the Form |
---|---|
<form action="findwords.jsp" method="get"> Select keywords from the list below: <input type="hidden" name="col" value="searchreports"> <select size="4" name="qp" multiple> <option value="Jakarta">Jakarta <option value="CachedRowSet">CachedRowSet <option value="properties">Properties <option value="Apache">Apache </select>
and/or enter a search phrase:</font> |
The code for the JSP page requires the search page to use wrapping paragraphs. For example, a page entry should look like the following. Note there are no returns within an entry to make the HTML more readable to someone viewing the source file:
<P>
<a href="/pathname/">Maintaining State for HTML Form
Buttons</a> by Matthias Laux <br>Here's the scoop on using
JavaServer Pages custom tags to maintain button state in your HTML
forms. <i>(October 2002)</i>
</p>
In contrast, the example below uses a return after "Laux" and before "Here's" to make the source file easier to read:
<P>
<a href="/pathname/">Maintaining State for HTML Form Buttons</a> by Matthias Laux
<br>Here's the scoop on using JavaServer Pages custom tags to maintain button
state in your HTML forms. <i>(October 2002)</i>
</p>
If you leave out formatting returns as in the first example above, the JSP code reads the entire line, parses it for keywords, and if there is a match, returns the entire line on the results page. In the second example, the JSP code reads the line up to Laux, parses it, and if there is a match, returns the line up to Laux on the results page. The code then reads the next line starting with "Here's."
In short, the entries are not properly returned because they are broken up. In the example above, if the search term is "Maintaining" only the title-link and author are returned without the blurb, and if the search term is "maintain" only the blurb is returned without the title-link and author.
This section walks through each section of findwords.jsp , which is the JSP code called when the user clicks the Go button.
Note: The source code is in a file with a "txt" extension so you can view it as text. A "jsp" extension tells the web server to compile the code into a servlet and execute it.
A JSP looks like an HTML page with servlet code segments embedded between JSP tags. There are a number of different kinds of JSP tags, and this code walkthrough touches on a few of them.
JSP directives are enclosed by the <%@
and
%>
directive tags, and are instructions
processed by the JSP engine when the JSP Page is translated to a servlet.
The page directives in this example tell the JSP engine the scripting
language is Java (language="java"
), and to include the indicated
Java packages.
<%@ page language="java" import="java.util.*, java.io.*" %>
JSP declarations are enclosed by the <%!
and %>
declaration tags, and let you set up variables for
later use in the program. You can also declare variables at the time you use them.
Just remember that the scope is the entire JSP page regardless of where a variable is
declared. The declarations in this example declare a random access file and
string variables for reading the search page.
<%! RandomAccessFile in = null; %> <%! String s = null; %>
The next lines of code initialize variables and open a connection to
searchpage.html
, which is the page to be searched.
The initializations get the length of
the file to be searched, get the first line of that file, and
set some variables to zero or null.
val
variable is a flag that indicates whether
or not a match is the first in its category, and therefore
necessitates a new a list of matches, or an addition
to an existing list.results
variable is
a flag that indicates whether or not any matches were found.
If no matches are found, a message is returned to the user
on the results page. <%-- Change directory paths to your application --%> File inputFile = new File("/pathname/searchpage.html"); in = new RandomAccessFile(inputFile, "r"); <%-- Get the length of the file --%> long length = in.length(); <%-- Read a line from the file --%> s = in.readLine(); <%-- Flag to tell if match is first in category --%> int val = 0; <%-- Flag to tell if results are found or not --%> int results = 0; String[] qpValues = null; String qtValue = null;
The following code print the results page heading, check for option
values passed from the selectable list (getParameterValues("qp")
)
and values from the input field (getParameter("qt")
) on
the form. All values found are retrieved and
used to build the bulleted index that appears at the top of the
results page.
out.println("<h4>Keyword Search Results</h4>"); <%-- Start bullet list --%> out.println("<ul>"); <%-- Check for option values --> if(request.getParameterValues("qp") != null) { qpValues = request.getParameterValues("qp"); <%-- Retrieve option values --%> for(int j = 0; j < qpValues.length; j++) { <%-- Make bullet list for each option value --%> out.println("<li><a href=#" + qpValues[j] + ">" + qpValues[j] + "</a>"); } } <%-- Check for input value --%> if(request.getParameter("qt") != null) { qtValue=request.getParameter("qt"); if(qtValue.length() > 0) { out.println("<li><a href=#" + qtValue + ">" + qtValue + "</a>"); } } <%-- End bullet list --%> out.println("</ul>");
Any option values retrieved are matched to characters in the lines
read from the file with their original capitalization as typed
on the form, and also converted to all lowercase. This is to catch
all possibilities in the entry. For example, "Properties"
is checked against each line in searchpage.html
as initial cap
"P" and as all lowercase, "properties" to account for it appearing at
the beginning of a sentence and within a sentence. Values from the input field
are checked exactly as typed, as all lowercase, as all uppercase, and
as initial caps.
Leading and trailing spaces are trimmed from each line and each line is
checked to see if it begins with "<A HREF." The searchpage.html
file is formatted with all lines starting flush left, but the trimming accounts
for any typing mistakes where a line might have spaces or tabs in front of
it by mistake. All lines of interest in the file begin with <A HREF
,
and of course this would have to be changed to work on another file where, for
example, the lines of interest begin with <li
or something else.
The main body of the code compares the option values to lines
read from searchpage.html
looking for matches. When matches
are found, the line is returned in the results page as a bullet item
under its correct category.
if(request.getParameterValues("qp") != null) { qpValues = request.getParameterValues("qp"); <%-- Iterate through option values --%> for (int i = 0; i < qpValues.length; i++) { <%-- Check validity of line read from file --%> while(s!=null) { if(s != null && s.length() > 0 ) { if(s.trim().startsWith("<A HREF") && qpValues[i].length() > 0) { <%-- Convert to lowercase --%> String lower = qpValues[i].toLowerCase(); <%-- Look for match --%> if(s.indexOf(qpValues[i]) > 0 || s.indexOf(lower) > 0) { results=1; <%-- Start new list of matches under topic --%> if(val == 0) { out.println("<a name=" + qpValues[i] + "></a>"); out.println("<h4>"); out.println(qpValues[i]); out.println("</h4>"); out.println("<ul>"); val=1; } <%-- Add match to existing list --%> out.println(s); out.println("<p>"); } } } <%--Read another line --%> s = in.readLine(); } val=0; out.println("</ul>");
If no matches are found, a message to that effect is returned on the results page. The file is reset to the beginning and the first line read to prepare to look for matches against another option value.
if(results==0) { out.println("<a name=" + qpValues[i] + "></a>"); out.println("<h4>"); out.println("No results found for " + qpValues[i]); out.println("</h4>"); } else { results=0; } in.seek(0); s = in.readLine(); } }
The main body of the code compares the input values to lines
read from searchpage.html
looking for matches. When matches
are found, the line is returned in the results page as a bullet item
under its correct category.
if(request.getParameter("qt") != null) { <%-- Get input value --%> qtValue = request.getParameter("qt"); val=0; <%-- Start at beginning of file and read a line --%> in.seek(0); s = in.readLine(); <%-- Check validity of line read from file --%> while(s!=null) { if(s != null && s.length() > 0 ) { if(s.trim().startsWith("<A HREF") && qtValue.length() > 0) { <%-- Create uppercase, lowercase, & init. caps --%> String uppercase = qtValue.toUpperCase(); String lowercase = qtValue.toLowerCase(); String firstletter = qtValue.substring(0,1); String lastletters = qtValue.substring(1); String upfirst = firstletter.toUpperCase(); String initcap = upfirst.concat(lastletters); if(s.indexOf(qtValue) > 0 || s.indexOf(lowercase) > 0 || s.indexOf(initcap) > 0 || s.indexOf(uppercase) > 0) { results=1; <%-- Start new list of matches under topic --%> if(val == 0) { out.println("<a name=" + qtValue + ">"); out.println("<h4>"); out.println(qtValue); out.println("</h4>"); out.println("<ul>"); val=1; } <%-- Add match to existing list --%> out.println(s); out.println("<p>"); } } } <%-- Read another line --%> s = in.readLine(); } out.println("</ul>");
If no matches are found, a message to that effect is returned on the results page.
if(results==0 && qtValue.length() > 0) { out.println("<a name=" + qtValue + ">"); out.println("<h4>"); out.println("No results found for " + qtValue); out.println("</h4>"); out.println("<p>"); } else { results=0; } }
In the event the user fails to select from the list or enter a keyword by typing into the input field, a message is returned on the results page notifying the user of that case.
if(request.getParameter("qp") == null && qtValue.length() == 0) { out.println("<h4>"); out.println("No Keywords were selected or entered."); out.println("</h4>"); out.println("<p>"); } %>
You could easily modify this program to capture the keywords the end user either selects or types into the input field. The reason for capturing the keywords is to see which keywords are most often selected or entered by the user. If you see a lot of entries for a particular keyword it could tell you something about the interests of your users or indicate that a keyword that is frequently input should probably be added as an option value to the selectable list.
To capture the keywords, you would modify the code by opening an
output stream on a file with read-write permissions, and
writing to that file. The code that gets the option and input
values goes to the end of the file, writes the
value, and adds a new line (/n
) character so each value is
on a separate line for readability.
Note:
The values from the selection box are written to keywords.txt
as machine-independent UTF characters because UTF characters are used in
graphical displays.
findwords.jsp shows the full source code with this functionality added.
<%! RandomAccessFile outkw = null; %> File outputFile = new File("/pathname/keywords.txt"); outkw = new RandomAccessFile(outputFile, "rw"); . . . if(request.getParameterValues("qp") != null) { qpValues = request.getParameterValues("qp"); for(int j = 0; j < qpValues.length; j++) { out.println("<li><a href=#" + qpValues[j] + ">" + qpValues[j] + "</a>"); outkw.seek(outputFile.length()); outkw.writeUTF(qpValues[j]); outkw.writeByte('\n'); } } if(request.getParameter("qt") != null) { qtValue=request.getParameter("qt"); if(qtValue.length() > 0) { out.println("<li><a href=#" + qtValue + ">" + qtValue + "</a>"); outkw.seek(outputFile.length()); outkw.writeChars(qtValue); outkw.writeByte('\n'); } }
JSP technology makes it easy to write a simple search engine to parse an HTML page for keyword matches. This simple program is specific to a page with a certain formatting, but can easily be adapted to work on pages with different formats.
A good exercise would be to add the code to read the lines regardless of whether formatting returns are placed in the entries to make them easier to read.
Adapt the program so it reads from a URL connection.
You would use the URL
class and can
see an example in the
Reading Directly from a URL chapter of
The Java Tutorial.
© 1994-2005 Sun Microsystems, Inc.