A number of organizations now have a large existing web site that is composed
of over one thousand web pages and many more images, text files, PDF files,
Word Files, audio files and video files. These web sites are often built by
hand using a text editor or an HTML editor like Macromedia Dreamweaver, Microsoft
FrontPage, Netscape Composer, etc. These sites started small and grew overtime.
Many IT departments are overwhelmed by the demands placed on the webmaster(s)
and need content management tools to let users do the publishing and approval
themselves. Oracle Portal (previously know as WebDB) is one of the content management
tools that are available on the market.
The challenge is: How do we migrate our whole web site into Portal? We do not want to keep managing the old content the old way and the new content with Portal. We want to manage everything with Portal. That is a very fair and common requirement.
This article is built in 2 parts. In Part 1 we walk you step by step through
the process of migrating an existing web site into Portal. In Part 2 we go into
depth into the technical secrets to get this done very efficiently. We show
how we developed a utility that uses the Portal API to automate the creation
of folders and editable HTML items in Portal, amongst other things.
Part 1: Running the Migration
The Process
Figure 1 provides an overview of the migration process:
Figure 1: Migration Process
Step 1: Grouping all the files to be published into ZIP archive(s)
All the files that constitute the legacy web site must be grouped in one or several ZIP file archives. These files to be zipped may be HTML, pictures, audio, video, Word documents or any other type of documentation. Do not forget to include the folder information (complete path) in the ZIP file in order to preserve the same directory structure in Portal as in the original web site and to ensure that all hyperlinks within the documents will still work.
Step 2: Bulk load the ZIP file in Portal
The resulting ZIP file(s) must then be uploaded into Portal 3.0. This is standard
Portal functionality: it requires logging in Portal using the URL: http://<host_name:port>/pls/portal30
and creating a ZIP item within a folder in a content area using the item wizard
illustrated in Figure 2. The folder where you upload the ZIP file is the staging
area represented in Figure 1. You should create a special folder for that purpose
within the Portal content area that will contain the old web site items compressed
in the ZIP file. You can name that folder 'Staging' or 'Zip'.
Figure 2: Creating a ZIP file in Portal
In order to run the item wizard you need to be in Edit mode in a folder, click on the link 'Add Item' and choose item type 'Zip File'.
Step 3: Unzip the archive in Portal
Once the Zip file(s) are in Portal it takes only one click of the mouse to unzip them. The unZip option can be seen in Figure 3 on the left hand side of the yellow 'New' icon. Portal 3.0. will create a Portal folder for each folder in the ZIP file so that the original file hierarchy is preserved and the hyperlinks continue to work in Portal.
Even if the file to unZip is of considerable size, this action does have to impact the machine's response time as Portal runs this as a background task, that is asynchronously.
Figure 3: Unzipping a file in Portal
Up to now we have been using standard Portal functionality. We have shown that Portal 3.0. has a bulk load functionality that enable users to upload a ZIP file into Portal and unzip it inside Portal. So all that is required to migrating existing content into Portal is to collect that content into one or several ZIP files, upload them into Portal and unzip them inside Portal. This works well for images, text files, PDF files, Word files, audio files and video files, but there is an issue with HTML files. All items uploaded are created as Portal item type 'file'. It is not possible to edit the content of a 'file' item in Portal. This limitation is acceptable with items like images because users do not expect to edit the content of an image using Portal. They would use a product like Adobe Photoshop instead. But in the case of HTML items, users do expect to be able to edit their content directly in Portal.
In order to enable users to edit the content of HTML items, it is necessary to transform each of the unzipped HTML items of type 'file' into items of type 'text'. This transformation can be done manually by creating a number of new text items it Portal and copy/pasting the content of the unzipped items into the new items. However this is far too laborious and error prone a process.
This transformation can also be done automatically by programming a utility that makes use of the Portal API. We have programmed such utility and will expose its design and code in Part 2. The function of this utility is to transform all unzipped items within one folder into editable items within Portal. The editable items are stored in a destination folder chosen by the administrator who runs the utility. This folder should be in the same content area as the staging area where you uploaded the ZIP file.
Step 5: Running the API utility
Figure 4 illustrates the migration utility being run. It is a Portal application
that we have created. It is composed of one list of value module and a form
based on a stored procedure. That stored procedure calls a PL/SQL package
that does all the work for you. All the code is exposed in Part 2.
Part 2: The Portal
3.0. API
In this part we will expose the code for each part of the migration utility and explain a lot of crucial Portal API details that are not documented yet. Using a top-down approach we will start from the parts exposed in Part 1 and progressively drill down in the details.
The list of value component is defined as a popup format and uses this hierarchical
query:
select LPAD(title,length(title)+(level-1)*6,'.'), id
from wwv_corners
where siteid >10 -- only user defined folders
start with parentid = 0
connect by parentid = prior id and
siteid
= prior siteid and
language
= prior language
The table wwv_corners is owned by the Portal30 schema in the database, unless the person who installed Portal on your server used a different schema. Wwv_corners contains all folder information for all folders defined in Portal, including the internal folders that come with the product. Folders constitute a hierarchy: a folder may contain other folders that in turn may contain others, etc.
There are a few tricks to the table wwv_corners. It also contains content area information. A content area is really a top level folder. All content area folders have their ID set to 1 and the parent folder ID set to 0. Yes, ID is NOT the primary key. Incidentally you will find that there is no folder record with ID equal to 0 so you could not have a foreign key from parentid to id. Each content area corresponds to one and only one site referred to by the column siteid. Site information can be found in table wwsbr_sites$.
The resulting list of value is illustrated in Figure 6.
The second Portal component is a form based on the stored procedure 'migrate'
documented in Listing A. This procedure must be owned by the same schema that
owns the application. Once you choose the procedure as you create the form
with the 'Create Form from Procedure' Wizard, Portal constructs one item for
each parameter of the procedure and all you need to do is refine some cosmetic
definitions like the template, display name, header text, footer text and
help text. For this migration utility we chose a tabular form. You can get
more layout flexibility if you use ask for custom form.
Figure 7 illustrates how by clicking on one of the parameters like p_f_source
you can set its fields' label, font face, font color, font size, field size.
In order to associate a list of value with a field you must select the item
type 'Popup'.
Listing A
In Figure 7, if you navigate to the submit_top item you will see that you can customize what happens when users click on the submit button. But it is only when you look at what Portal does with your code that you can fully understand how far you can push it. By looking at the 'MIGRATE_I' package we found the submit code in procedure wwv_gensys_1 documented in listing B . In that procedure we can see for example that we may make use of 6 parameters: p_block_name, p_object_name, p_instance, p_event_type, p_user_args and p_session. That should give you some ideas beyond the scope of this article...
Listing B
The Migration Package
The fast_load_db package must be own by the Portal schema because it calls a number of Portal API modules. The privileges required to do this would have to be granted directly (not through a role) to the schema owning the migration package. The benefits of doing this are outweighed by the time cost.
In this section we will show the code that is core to the migration package. In listings below we have stripped the code from basic validation and exception processing logic. This cuts the volume of code shown for each procedure by half and makes it easier for the reader to focus on Oracle Portal specific issues. If you want a full copy of the 1335 lines of code feel to contact the author.
We will start with the procedure that is the entry point in the migration package. This procedure is documented in listing C.
The procedure migrate does only four things:
Listing C
The procedure migrate takes only two parameters because all it does is migrate content from a source folder inside Portal into a destination folder.
The cursor defined in lines 4 to 7 is used in lines 13-15 and 18-20 in order to check that the IDs passed as parameters for the source folder and the destination folder are valid IDs. Table wwv_corners is the Portal table that contains the folder information as seen in the List of Value module above.
Line 23 derives the internal ID of the site that contains the destination
folder. The query used by function get_f_site_id is:
select siteid
from wwv_corners
where id = p_f_id;
It is more efficient to run this query once in procedure migrate than to run
it several times in the recursive procedure migrate_f.
Finally, Line 26 calls recursive procedure migrate_f that will do all the work for us.
Listing D
The basic algorithm of procedure migrate_f is:
Lines 6-9 define the cursor that provides the sub folders for a given folder. The column parentid is a pseudo foreign key onto the column id of table wwv_corners.
Lines 10-13 define the cursor that returns a list of items for a given Portal folder. Items can be images, HTML documents, sounds, PDFs, Word documents, videos, etc. Table wwv_things is the Portal driving table for all item related information.
Lines 21-38 show the loop through the folder hierarchy. This is the recursive part of procedure migrate_f that drills down from the top source folder passed as parameter p_f_source all the way to its sub most folders. Within that loop, lines 23 to 28 call private procedure make_folder that creates a Portal folder in the destination folder for each folder in the source folder. That procedure is documented in listing E.
Listing E
Continuing on listing D, line 41 calls a function that constructs the
full path of a given folder within Portal. This functionality is a little
complex but extremely useful when constructing HTML links programmatically
within Portal. So we will use an example to explain this function clearly.
Let us say that we have 3 levels of folders. We will call the first level
folder Admin. Folder 'Students' is owned by 'Admin'. Folder 'Visits' is
owned by 'Students'. Figure 8 illustrates how this structure would be stored
in Portal.
Folder Name | ID | Parent ID | Site ID |
Admin | 1 | 0 | 53 |
Students | 78 | 51 | 53 |
Visits | 94 | 78 | 53 |
Given the values illustrated in figure 8 for table wwv_corners, function get_folder_www_path(p_f_id=>94, p_siteid =>53) would return 'docs/FOLDER/Admin/Students/Visits/' which is the relative URL for folder Visits. Similarly, get_folder_www_path(p_f_id=>78, p_siteid =>53) would return 'docs/FOLDER/Admin/Students/'.
We have illustrated function get_folder_www_path in Listing F. The core of this function is a hierarchical SQL query that drills up from the folder passed in parameter p_f_id to the top level folder (i.e. content area) of its folder hierarchy: lines 5-13. If you are not used to hierarchical queries, note that the keyword 'prior' qualifies the current row and that the other side of the connect by clause relates to the next row being retrieved. The 'start with' clause retrieves the very first row of the folder hierarchy.
Listing F
In listing F the first loop in lines 19-22 stores the name of each folder that constitutes the hierarchy of the folder passed in parameter p_f_id. This list of folders is built into array variable l_f_names. The second loop in lines 24-26 flattens these folders into one URL path string contained in variable l_path whose value is returned by the function when the hierarchy navigation is complete.
Coming back to Listing D we are now going to explain the loop in lines 44-82. Because of the recursive nature of procedure migrate_f, this loop is run by the procedure once all sub-folders of the current folder have been processed. This loop goes through each file in the current source folder p_f_source.
The if statement in line 45 tests the extension of the current file. Anything that is not an HTML file simply gets copied into the destination folder hierarchy in lines 71-80. This is done by making a simple call to the Portal API procedure wwv_thingdb.copything. If however the file is an HTML one, a straight copy will not do. The utility must convert the item from a Portal file datatype into a Portal text datatype so that the HTML is editable by the user. It must also change a few things within the HTML.
Line 47 in Listing D retrieves the HTML text into varchar2(32767) variable v_source_html. This is done with function get_file_text that is another one of these very good candidates for re-use for Oracle Portal utility programming: it fetches a given file stored in Portal, grabs its content that is stored in a Binary Large Object format and converts it into a varchar2 format. The logic is exposed in Listing G where we make use of 2 packages supplied with Oracle 8i Release 2: dbms_lob for the truncation of the LOB content and utl_raw for the conversion into varchar2. Package wwdoc_api is a Portal30 supplied package, part of the API.
Listing G
Coming back to Listing D, we will now look at the necessary massaging of the HTML content performed in line 48. Without getting into too much detail about the string operations performed in listing H it is necessary to understand that in order to migrate an item of type file into a text item you must at least:
Back to Listing D again, we can now move to the API command that creates a
new item within Portal. It is encapsulated in procedure create_thing in lines
51-65. There are 150 lines of code in that procedure. We will not list it here
because of its length and because most of what that procedure does is call Portal
API's wwv_thingsave.saveitem after a few parameter conversions.
The create_thing procedure creates a text item in the destination folder
referred to by parameter p_f_dest. All other parameters are self
explanatory apart from p_inplace that when set to 1 instructs Portal to
display the HTML item inline inside the folder that owns it.
Lines 67-68 create a record of what the utility did in the table LOG.
There is really nothing more to it. You now have all the key elements that
you need in order to create a large number of elements in Portal through the
backdoor, i.e. programmatically. This should save you a whole lot of time.