Migrate Web Site Content in Portal 3.0. with its API

By Hervé Deschamps, Oracle Corporation.

A number of organizations now have a large existing web site that is composed of over one thousand web pages and many more images, text files, PDF files, Word Files, audio files and video files. These web sites are often built by hand using a text editor or an HTML editor like Macromedia Dreamweaver, Microsoft FrontPage, Netscape Composer, etc. These sites started small and grew overtime. Many IT departments are overwhelmed by the demands placed on the webmaster(s) and need content management tools to let users do the publishing and approval themselves. Oracle Portal (previously know as WebDB) is one of the content management tools that are available on the market.

The challenge is: How do we migrate our whole web site into Portal? We do not want to keep managing the old content the old way and the new content with Portal. We want to manage everything with Portal. That is a very fair and common requirement.

This article is built in 2 parts. In Part 1 we walk you step by step through the process of migrating an existing web site into Portal. In Part 2 we go into depth into the technical secrets to get this done very efficiently. We show how we developed a utility that uses the Portal API to automate the creation of folders and editable HTML items in Portal, amongst other things.

Part 1: Running the Migration

The Process

Figure 1 provides an overview of the migration process:

  1. Zip the files composing the old web site
  2. Upload the ZIP files into Portal (Bulk Load)
  3. Unzip the ZIP files
  4. Run a Portal API based utility to automate a number of migration tasks with Portal.
  5. Manually edit what cannot be done automatically in an efficient manner.

Figure 1: Migration Process

Step 1: Grouping all the files to be published into ZIP archive(s)

All the files that constitute the legacy web site must be grouped in one or several ZIP file archives. These files to be zipped may be HTML, pictures, audio, video, Word documents or any other type of documentation. Do not forget to include the folder information (complete path) in the ZIP file in order to preserve the same directory structure in Portal as in the original web site and to ensure that all hyperlinks within the documents will still work.

Step 2: Bulk load the ZIP file in Portal

The resulting ZIP file(s) must then be uploaded into Portal 3.0. This is standard Portal functionality: it requires logging in Portal using the URL: http://<host_name:port>/pls/portal30 and creating a ZIP item within a folder in a content area using the item wizard illustrated in Figure 2. The folder where you upload the ZIP file is the staging area represented in Figure 1. You should create a special folder for that purpose within the Portal content area that will contain the old web site items compressed in the ZIP file. You can name that folder 'Staging' or 'Zip'.

Figure 2: Creating a ZIP file in Portal

In order to run the item wizard you need to be in Edit mode in a folder, click on the link 'Add Item' and choose item type 'Zip File'.

Step 3: Unzip the archive in Portal

Once the Zip file(s) are in Portal it takes only one click of the mouse to unzip them. The unZip option can be seen in Figure 3 on the left hand side of the yellow 'New' icon. Portal 3.0. will create a Portal folder for each folder in the ZIP file so that the original file hierarchy is preserved and the hyperlinks continue to work in Portal.

Even if the file to unZip is of considerable size, this action does have to impact the machine's response time as Portal runs this as a background task, that is asynchronously.

Figure 3: Unzipping a file in Portal

Once the unzipping is complete you should delete the ZIP files from Portal if you want to preserve disk space and speed up the execution time of the migration process. If you don't delete the ZIP file, the migration API utility will copy it in the destination folder that you define in the next step.

Step 4: Create the destination folder

Up to now we have been using standard Portal functionality. We have shown that Portal 3.0. has a bulk load functionality that enable users to upload a ZIP file into Portal and unzip it inside Portal. So all that is required to migrating existing content into Portal is to collect that content into one or several ZIP files, upload them into Portal and unzip them inside Portal. This works well for images, text files, PDF files, Word files, audio files and video files, but there is an issue with HTML files. All items uploaded are created as Portal item type 'file'. It is not possible to edit the content of a 'file' item in Portal. This limitation is acceptable with items like images because users do not expect to edit the content of an image using Portal. They would use a product like Adobe Photoshop instead. But in the case of HTML items, users do expect to be able to edit their content directly in Portal.

In order to enable users to edit the content of HTML items, it is necessary to transform each of the unzipped HTML items of type 'file' into items of type 'text'. This transformation can be done manually by creating a number of new text items it Portal and copy/pasting the content of the unzipped items into the new items. However this is far too laborious and error prone a process.

This transformation can also be done automatically by programming a utility that makes use of the Portal API. We have programmed such utility and will expose its design and code in Part 2. The function of this utility is to transform all unzipped items within one folder into editable items within Portal. The editable items are stored in a destination folder chosen by the administrator who runs the utility. This folder should be in the same content area as the staging area where you uploaded the ZIP file.

Ideally, the destination folder should already exist. Before migrating content from a legacy web site into Portal you should define the structure of your new corporate web site inside Portal. This structure will consist of a content area/folder hierarchy to hold the content, banners and navigation bars. You will publish these folders and navigation bars as portlets when they are ready to be exposed to the Internet/Intranet community. You will also need to design pages in Portal to define the layout of these portlets. A page is usually composed of a banner on top, a navigation bar on the left hand side, content in portlets in the middle and sometimes a promotion area on the right hand side. It is within such structure that you create the destination folder.

Step 5: Running the API utility

Figure 4 illustrates the migration utility being run. It is a Portal application that we have created. It is composed of one list of value module and a form based on a stored procedure. That stored procedure calls a PL/SQL package that does all the work for you. All the code is exposed in Part 2.

The administrator designates the source folder containing the unzipped items and the destination folder where the content is to be migrated and the utility takes care of migrating the content of the source folder into the destination folder. The utility always replicates the source folder hierarchy into the destination folder.

Figure 4: Running the API utility

The folder number that you see for the source folder is an internal Portal ID. Unless users happen to know these internal IDs they must use the list of value button to bring up the list component and pick a folder. This selection populates the source and destination folder fields with the internal ID for each folder..

Step 6: Manual tweaking

The API utility is only intended as a tool to do the grunt work. It cannot replace the human skills necessary to "tweak" the look and feel of each page. For example, FastLoad will not be able to remove banners or headers from the uploaded pages unless it can apply the exact same transformation to all the unzipped files in the source folder. We can change the code of the utility to increase its "intelligence" but there will be a point where it will be more efficient to perform manual changes than to program them.

Part 2: The Portal 3.0. API

In this part we will expose the code for each part of the migration utility and explain a lot of crucial Portal API details that are not documented yet. Using a top-down approach we will start from the parts exposed in Part 1 and progressively drill down in the details.

Portal Components
As shown in Figure 5, the migration Portal application is made of two components in Portal:
Figure 5: Components of the API utility

The list of value component is defined as a popup format and uses this hierarchical query:

select LPAD(title,length(title)+(level-1)*6,'.'), id
from wwv_corners
where siteid >10  -- only user defined folders
start with parentid = 0
connect by parentid = prior id and
           siteid   = prior siteid and
           language = prior language

The table wwv_corners is owned by the Portal30 schema in the database, unless the person who installed Portal on your server used a different schema. Wwv_corners contains all folder information for all folders defined in Portal, including the internal folders that come with the product. Folders constitute a hierarchy: a folder may contain other folders that in turn may contain others, etc.

There are a few tricks to the table wwv_corners. It also contains content area information. A content area is really a top level folder. All content area folders have their ID set to 1 and the parent folder ID set to 0. Yes, ID is NOT the primary key. Incidentally you will find that there is no folder record with ID equal to 0 so you could not have a foreign key from parentid to id. Each content area corresponds to one and only one site referred to by the column siteid. Site information can be found in table wwsbr_sites$.

The resulting list of value is illustrated in Figure 6.

Figure 6: Hierarchical List of Value

The second Portal component is a form based on the stored procedure 'migrate' documented in Listing A. This procedure must be owned by the same schema that owns the application. Once you choose the procedure as you create the form with the 'Create Form from Procedure' Wizard, Portal constructs one item for each parameter of the procedure and all you need to do is refine some cosmetic definitions like the template, display name, header text, footer text and help text. For this migration utility we chose a tabular form. You can get more layout flexibility if you use ask for custom form.

Figure 7 illustrates how by clicking on one of the parameters like p_f_source you can set its fields' label, font face, font color, font size, field size. In order to associate a list of value with a field you must select the item type 'Popup'.

Figure 7: Field definition in a Portal Parameter Form

You can also add logic in JavaScript or PL/SQL by using the SUBMIT_TOP section or the right hand side tab 'Additional PL/SQL' code.

Listing A

procedure FASTLOAD.migrate (p_f_source    in number,
                            p_f_dest      in number
                            ) is
   fast_load_db.migrate (p_f_source, p_f_dest);

Code for the procedure called by the Portal Form Component
All procedure migrate does is call the package that contains our utility. Only two parameters are necessary: source and destination folder. The package takes it from there.

It may be of help for the reader to look at the PL/SQL code generated by Portal to render the Parameter Form. This code helps to understand the inner mechanisms of Portal and may give you ideas to push Portal beyond the obvious and thus achieve more functionality. In the case of our module 'MIGRATE_I', we can find the package generated by Portal in the schema that owns the application. The name of the package is the same as the name of the module.

In Figure 7, if you navigate to the submit_top item you will see that you can customize what happens when users click on the submit button. But it is only when you look at what Portal does with your code that you can fully understand how far you can push it. By looking at the 'MIGRATE_I' package we found the submit code in procedure wwv_gensys_1 documented in listing B . In that procedure we can see for example that we may make use of 6 parameters: p_block_name, p_object_name, p_instance, p_event_type, p_user_args and p_session. That should give you some ideas beyond the scope of this article...

Listing B

    procedure WWV_GENSYS_1
        p_block_name     in varchar2,
        p_object_name    in varchar2,
        p_instance       in integer,
        p_event_type     in varchar2,
        p_user_args      in varchar2,
        p_session        in out PORTAL30.wwa_api_module_session
        procedure doSUBMIT
              p_block_name    =>     p_block_name ,
              p_object_name   =>     p_object_name,
              p_instance      =>     p_instance   ,
              p_event_type    =>     p_event_type ,
              p_user_args     =>     p_user_args  ,
              p_session       =>     p_session);
        end doSUBMIT;
        --- Type your PL/SQL code here...
doSubmit;--- This is the default handler
--- ...and here, thanks...
    end WWV_GENSYS_1;

Code generated by Portal for the submission part of a parameter form.

The Migration Package

The core of this migration utility is PL/SQL code that runs in an Oracle 8i Release 2 database. This code is in a package called fast_load_db that was developed with a PL/SQL editor instead of with Portal. One can view and edit the code within Portal using the Portal's Database Object Navigator.

The fast_load_db package must be own by the Portal schema because it calls a number of Portal API modules. The privileges required to do this would have to be granted directly (not through a role) to the schema owning the migration package. The benefits of doing this are outweighed by the time cost.

In this section we will show the code that is core to the migration package. In listings below we have stripped the code from basic validation and exception processing logic. This cuts the volume of code shown for each procedure by half and makes it easier for the reader to focus on Oracle Portal specific issues. If you want a full copy of the 1335 lines of code feel to contact the author.

We will start with the procedure that is the entry point in the migration package. This procedure is documented in listing C.

The procedure migrate does only four things:

Listing C

1     procedure migrate (p_f_source in number,
2                        p_f_dest in number
3                        ) is
4       cursor exists_f (p_id in number) is
5          select 'nothing'
6          from   wwv_corners
7          where  id = p_id;
8       v_check varchar2(10);
9       v_dest_f_siteid number;
11    begin
12       -- verify that p_f_source exists
13           open exists_f(p_f_source);
14           fetch exists_f into v_check;
15           close exists_f;
17       -- verify that p_f_dest exists
18           open exists_f(p_f_dest);
19           fetch exists_f into v_check;
20           close exists_f;
22       -- Get the site ID of the destination folder
23          v_dest_f_siteid   := get_f_site_id (p_f_dest);
25       -- call migrate_f (p_f_source, p_f_dest)
26         migrate_f (p_f_source, p_f_dest, v_dest_f_siteid);
27    end; -- procedure migrate

Procedure Migrate is the entry point of the migration package.

The procedure migrate takes only two parameters because all it does is migrate content from a source folder inside Portal into a destination folder.

The cursor defined in lines 4 to 7 is used in lines 13-15 and 18-20 in order to check that the IDs passed as parameters for the source folder and the destination folder are valid IDs. Table wwv_corners is the Portal table that contains the folder information as seen in the List of Value module above.

Line 23 derives the internal ID of the site that contains the destination folder. The query used by function get_f_site_id is:
select siteid
from   wwv_corners
where  id = p_f_id;
It is more efficient to run this query once in procedure migrate than to run it several times in the recursive procedure migrate_f.

Finally, Line 26 calls recursive procedure migrate_f that will do all the work for us.

Listing D

1    procedure migrate_f (p_f_source         in number,
2                         p_f_dest           in number,
3                         p_dest_f_siteid    in number,
4                         p_html_item_t_id   in number default 2) is
6        cursor sub_f is
7           select id, name, title
8           from   wwv_corners
9           where  parentid = p_f_source;
10       cursor items is
11          select id, name, masterthingid, filename, title, description
12          from wwv_things
13          where cornerid = p_f_source;
14       v_sf_dest_id  number;  -- sub folder internal Portal id
15       v_item_id     number;  -- copied item internal Portal id
16       v_source_html varchar2(32767);
17       v_dest_html   varchar2(32767);
18       v_f_www_path  varchar2(2000);
19    begin
20           -- for each sub-folders
21           for cur_sub_f in sub_f loop
22              -- create a sub-folder in p_f_dest
23              v_sf_dest_id := make_folder(
24                  p_site_id     => p_dest_f_siteid,
25                  p_parent_id   => p_f_dest,
26                  p_f_type_id   => c_f_type_id_container,
27                  p_name        => cur_sub_f.name,
28                  p_title       => cur_sub_f.title);
30              -- record action or error in log table
31              util.log (p_process=>'fast_load_db.migrate_f', p_msg=>'Migrated folder: '||cur_sub_f.name||'.', p_is_trace=>'Y');
33              -- call migrate_f (p_sf_source, p_sf_dest)
34              migrate_f (cur_sub_f.id,
35                         v_sf_dest_id,
36                         p_dest_f_siteid,
37                         p_html_item_t_id);
38           end loop; -- each sub folders
40           -- get the www_path of the folder
41           v_f_www_path :=  get_folder_www_path ( p_f_dest, p_dest_f_siteid);
43           -- for each item in sub-folder
44           for cur_item in items loop
45              if lower(get_f_extension(cur_item.name)) in ('htm','html','shtml') then
46                 -- get the html text and massage it to remove some tags
47                 v_source_html := get_file_text (cur_item.filename);
48                 v_dest_html := massage_html (v_source_html, cur_item.name, v_f_www_path);
50                 -- create new item as custom item type
51                 create_thing (
52                     p_itemtype          => 'text',
53                     p_name              =>  cur_item.name,
54                     p_title             =>  cur_item.title,
55                     p_type              => 'general',
56                     p_language          => 'us',
57                     p_cornerid          =>  p_f_dest,
58                     p_subtypeid         =>  p_html_item_t_id, -- would be 2 if this was a text item
59                     p_topicid           =>  30, -- category 30: general
60                     p_author            => 'Load Interface',
61                     p_description       =>  cur_item.description,
62                     p_text              =>  v_dest_html,
63                     p_siteid            =>  p_dest_f_siteid,
64                     p_inplace           =>  g_inplace,
65                     p_fullscreen        =>  0);
67                 -- record action or error in log table
68                 util.log (p_process=>'fast_load_db.migrate_f', p_msg=>'Migrated HTML item '||cur_item.name||'.', p_is_trace=>'Y');
70              else -- item type is image, audio, video, Word, Powerpoint, PDF, anything then do a straight copy
71                 v_item_id := wwv_thingdb.copything(
72                                   p_sourcemasterthingid      => cur_item.masterthingid, -- master ID, not object id!
73                                   p_destinationcornerid      => p_f_dest,
74                                   p_destinationmasterthingid => null, -- optional, API creates it
75                                   p_siteid                   => p_dest_f_siteid,
76                                   p_copysiteid               => p_dest_f_siteid);
78                 -- record action or error in log table
79                 util.log (p_process=>'fast_load_db.migrate_f',
80                           p_msg=>'Copied item '||cur_item.name||' to the destination folder structure.', p_is_trace=>'Y');
81              end if; -- test on file
82           end loop; -- for cur_item in items
83    end; -- procedure migrate_f

Procedure Migrate_f is the recursive procedure that drives the grunt of the migration work.

The basic algorithm of procedure migrate_f is:

  1. If there are any sub-folders in the current folder, call migrate_f (recursive call) for each sub-folder;
  2. Otherwise process each item in the current folder.
Very easy isn't it?

Lines 6-9 define the cursor that provides the sub folders for a given folder. The column parentid is a pseudo foreign key onto the column id of table wwv_corners.

Lines 10-13 define the cursor that returns a list of items for a given Portal folder. Items can be images, HTML documents, sounds, PDFs, Word documents, videos, etc. Table wwv_things is the Portal driving table for all item related information.

Lines 21-38 show the loop through the folder hierarchy. This is the recursive part of procedure migrate_f that drills down from the top source folder passed as parameter p_f_source all the way to its sub most folders. Within that loop, lines 23 to 28 call private procedure make_folder that creates a Portal folder in the destination folder for each folder in the source folder. That procedure is documented in listing E.

Listing E

1    function make_folder (
2             p_site_id          in number,         -- Site id for the content area
3             p_parent_id        in number,         -- Parent folder id
4             p_f_type_id        in varchar2,       -- Folder type: container, url, search, plsql, portletfolder
5             p_name             in varchar2,       -- Folder name
6             p_title            in varchar2        -- Folder title
7                         ) return number is
8       v_folder_id   number;
9    begin
10       v_folder_id := wwv_cornerdb.add(
11             p_parent_id => p_parent_id,
12             p_type_id   => p_f_type_id,
13             p_name      => p_name,
14             p_title     => p_title,
15             p_site_id   => p_site_id);
16       return v_folder_id;
17    end; -- procedure make_folder

Procedure that creates a folder within Portal

Continuing on listing D, line 41 calls a function that constructs the full path of a given folder within Portal. This functionality is a little complex but extremely useful when constructing HTML links programmatically within Portal. So we will use an example to explain this function clearly. Let us say that we have 3 levels of folders. We will call the first level folder Admin. Folder 'Students' is owned by 'Admin'. Folder 'Visits' is owned by 'Students'. Figure 8 illustrates how this structure would be stored in Portal.
Folder Name ID Parent ID Site ID
Admin 1 0 53
Students 78 51 53
Visits 94 78 53
Figure 8: Folder hierarchy Table wwv_corners: Admin/Students/Visits/

Given the values illustrated in figure 8 for table wwv_corners, function get_folder_www_path(p_f_id=>94, p_siteid =>53) would return 'docs/FOLDER/Admin/Students/Visits/' which is the relative URL for folder Visits. Similarly, get_folder_www_path(p_f_id=>78, p_siteid =>53) would return 'docs/FOLDER/Admin/Students/'.

We have illustrated function get_folder_www_path in Listing F. The core of this function is a hierarchical SQL query that drills up from the folder passed in parameter p_f_id to the top level folder (i.e. content area) of its folder hierarchy: lines 5-13. If you are not used to hierarchical queries, note that the keyword 'prior' qualifies the current row and that the other side of the connect by clause relates to the next row being retrieved. The 'start with' clause retrieves the very first row of the folder hierarchy.

Listing F

1    function get_folder_www_path ( p_f_id      in   number,
2                                   p_siteid    in   number,
3                                   p_language  in   varchar2 default 'us')
4               return varchar2 is
5       cursor get_folder_www_path is
6          select      name
7          from        wwv_corners
8          where       siteid = p_siteid and
9                      language='us'
10          start with  id = p_f_id
11          connect by  prior parentid = id and
12                      prior siteid   = siteid and
13                      prior language = language;
14       l_path      varchar2(1000) := 'docs/FOLDER/';
15       l_f_names   wwsbr_type.array; -- table of varchar2(2000)
16       l_i_loop    number := 0;
17    begin
18       -- Get the elements of the path from child to parent
19       for cur_folder in get_folder_www_path loop
20          l_i_loop := l_i_loop + 1;
21          l_f_names(l_i_loop) := cur_folder.name;
22       end loop;
23       -- Reverse the order of the elements in the path
24       for l_i_loop in reverse 1 .. l_f_names.count loop
25          l_path := l_path || l_f_names(l_i_loop) || '/';
26       end loop;
27       return l_path;
28    end; -- function get_folder_www_path

Function get_folder_www_path constructs the relative URL of any given folder.

In listing F the first loop in lines 19-22 stores the name of each folder that constitutes the hierarchy of the folder passed in parameter p_f_id. This list of folders is built into array variable l_f_names. The second loop in lines 24-26 flattens these folders into one URL path string contained in variable l_path whose value is returned by the function when the hierarchy navigation is complete.

Coming back to Listing D we are now going to explain the loop in lines 44-82. Because of the recursive nature of procedure migrate_f, this loop is run by the procedure once all sub-folders of the current folder have been processed. This loop goes through each file in the current source folder p_f_source.

The if statement in line 45 tests the extension of the current file. Anything that is not an HTML file simply gets copied into the destination folder hierarchy in lines 71-80. This is done by making a simple call to the Portal API procedure wwv_thingdb.copything. If however the file is an HTML one, a straight copy will not do. The utility must convert the item from a Portal file datatype into a Portal text datatype so that the HTML is editable by the user. It must also change a few things within the HTML.

Line 47 in Listing D retrieves the HTML text into varchar2(32767) variable v_source_html. This is done with function get_file_text that is another one of these very good candidates for re-use for Oracle Portal utility programming: it fetches a given file stored in Portal, grabs its content that is stored in a Binary Large Object format and converts it into a varchar2 format. The logic is exposed in Listing G where we make use of 2 packages supplied with Oracle 8i Release 2: dbms_lob for the truncation of the LOB content and utl_raw for the conversion into varchar2. Package wwdoc_api is a Portal30 supplied package, part of the API.

Listing G

1     function get_file_text (p_file_name in varchar2) return varchar2 is
2        v_blob_content blob;
3        v_raw_content  raw(32767);
4     begin
5        v_blob_content := wwdoc_api.get_document_blob_content(p_name => p_file_name);
6        v_raw_content := dbms_lob.substr (v_blob_content,32767); -- 32767 first characters only
7        return utl_raw.cast_to_varchar2 (v_raw_content);
8     end; -- function get_file_text

Function get_file_text converts the BLOB content of a text file into a varchar2 string.

Coming back to Listing D, we will now look at the necessary massaging of the HTML content performed in line 48. Without getting into too much detail about the string operations performed in listing H it is necessary to understand that in order to migrate an item of type file into a text item you must at least:

Listing H

1    function massage_html (p_source_html in varchar2,
2                           p_name in varchar2,
3                           p_f_www_path in varchar2) return varchar2 is
4       v_dest_html varchar2(32767);
5       i_end_tag   number;
6    begin
7       v_dest_html := util.snip_between(p_source_html,'<body','</body>');
9       -- The next 2 lines remove any attribute of the HTML tag <body>.
10      i_end_tag   := instr(v_dest_html,'>');
11      v_dest_html := substr (v_dest_html, i_end_tag+1, length(v_dest_html)-i_end_tag);
13      -- The next 2 lines reworks SRC paths so that links are preserved in the HTML document.
14      v_dest_html := replace (v_dest_html, 'SRC="','SRC="'||p_f_www_path);
15      v_dest_html := replace (v_dest_html, 'src="','src="'||p_f_www_path);
17      return v_dest_html;
18    end; -- function massage_html

Function massage_html returns the content found between tags <body> and </body>

Back to Listing D again, we can now move to the API command that creates a new item within Portal. It is encapsulated in procedure create_thing in lines 51-65. There are 150 lines of code in that procedure. We will not list it here because of its length and because most of what that procedure does is call Portal API's wwv_thingsave.saveitem after a few parameter conversions.

The create_thing procedure creates a text item in the destination folder referred  to by parameter p_f_dest. All other parameters are self explanatory apart from p_inplace that when set to 1 instructs Portal to display the HTML item inline inside the folder that owns it.

Lines 67-68 create a record of what the utility did in the table LOG.

There is really nothing more to it. You now have all the key elements that you need in order to create a large number of elements in Portal through the backdoor, i.e. programmatically. This should save you a whole lot of time.

Hervé Deschamps is a Technical Manager with Oracle Corporation. Over the years he has developed a number of applications using Oracle Development Tools and others.  He has an extensive experience of all phases of the development life cycle. He is also known in the technical community for his article publications centered around best custom development practices and for his user group presentations. You can reach him by e-mail at herve.deschamps@.oracle.com. He also maintains a web site full of articles, scripts, tips and techniques at http://www.iherve.com.