Showing posts with label database. Show all posts
Showing posts with label database. Show all posts

Processing Stack Overflow data dump with Apache Spark

This post is about the final work I did for one of the disciplines of the Master's degree I'm currently attending at UFRJ - Federal University of Rio de Janeiro in the branch of Data and Knowledge Engineering (Databases) that is under the division of Computer and Systems Engineering department at COPPE\UFRJ.

The discipline is called Special Topics in Databases IV and is taught by professor Alexandre Bento de Assis Lima.

The presentation (PPT slides) is in Brazilian Portuguese. I'll translate the slides to English in this blog post. They give an overall view about the work done.

The final paper is written in English.

Files

Trabalho prático sobre Apache Spark envolvendo um problema típico de Big Data (apresentaĆ§Ć£o\presentation).pdf (in Portuguese)

Processing Stack Overflow data dump with Apache Spark (in English)

Abstract. This paper describes the process involved in building an ETL tool based on Apache Spark. It imports XML data from Stack Overflow data dump.
The XML files are processed using Spark XML library and converted to a DataFrame object. The DataFrame data is then queried with Spark SQL library.
Two applications were developed: spark-backend and spark-frontend. The first one contains the code responsible for dealing with Spark while the later one is user centric allowing the users to consume the data processed by Spark.

All the code developed is in English and should be easy to read.

Presentation
  1. Objective
  2. Problem
  3. Technologies
  4. Strategy used to acquire the data
  5. Development
  6. Conclusion
  7. Links
  1. Objective
    • Put into practice the concepts presented during the classes.
    • Have a closer contact with modern technologies used to process Big Data.
    • Automate the Extraction\Mining of valuable\interesting information hidden in the immensity of data.

  2. Problem
    • Analyse StackOverflow data dump available on the internet on a monthly basis.
    • The data dump is composed of a set of XML files compacted with the .7z extension.
    • Even after compaction the biggest file has 15.3 GB. This size is directly linked to the data volume handled by Big Data.
    • Spark at first will be used as an ETL tool (ETL = Extract > Transform > Load) to prepare the data consumed by a front-end web app.
    • "At first" because there's also the possibility of using Spark as a tool to process the data that'll be shown in the web app.

  3. Technologies
    • Apache Spark 2.0.1 +
    • Spark XML 0.4.1 +
    • Spark SQL 2.0.2
    • Ubuntu 16.04 LTS (Xenial Xerus)
    • Linux VM (virtual machine) running on Parallels Desktop 12 for Mac
    • Scala 2.11.8
    • XML (Extensible Markup Language)
    • XSL (Extensible Stylesheet Language)
    • Play Framework 2.5 (front end)
    • Eclipse Neon 4.6.1 with Scala IDE 4.5.0 plugin as the IDE

  4. Strategy used to acquire the data
    • Got the .torrent file that contains all the data dumps from Stack Exchange family of sites - https://archive.org/details/stackexchange
    • Selected the eight .7z files related to StackOverflow: stackoverflow.com-Badges.7z, stackoverflow.com-Comments.7z, stackoverflow.com-PostHistory.7z, stackoverflow.com-PostLinks.7z, stackoverflow.com-Posts.7z, stackoverflow.com-Tags.7z, stackoverflow.com-Users.7z, stackoverflow.com-Votes.7z

  5. Development
    • To make the work viable (running locally out of a cluster), a single .xml file [Users.xml] was used. A subset of 100.000 lines (32.7 MB) was selected. This file has a total of 5,987.287 lines (1.8 GB).
    • hadoop@ubuntu:/media/psf/FreeAgent GoFlex Drive/Downloads$ head -100000 Users.xml > Users100000.xml
    • The file Users.xsl was used covert Users100000.xml data to the format expected by spark-xml library. The result was saved to Users100000.out.xml.


    • The .xml and .xsl files were placed into the input folder of the Scala project [spark-backend] inside Eclipse.
    • The application spark-backend read the file Users100000.out.xml through Spark XML and transforms it into a DataFrame object.
    • The Spark SQL library is used subsequently to search the data. Some sample queries were created.
    • Each query generates a CSV file (SaveDfToCsv) to be consumed in a later stage by a web application [spark-frontend], that is, Spark is used as an ETL tool.
    • The result of each query is saved in multiple files in the folder output. This happens because Spark was conceived to execute jobs in a cluster (multiple nodes\computers).
    • For testing purposes, a method that renames the CSV file was created. This method copies the generated CSV to a folder called csv. The destiny folder can be configured in the file conf/spark-backend.properties.



    • The application [spark-backend] can be executed inside Eclipse or through the command line in Terminal using the command spark-submit.
    • In the command line we make use of the JAR file produced during the project build in Eclipse. We pass as parameters the necessary packages as below:
    • spark-submit --packages com.databricks:spark-xml_2.11:0.4.1 -- class com.lenielmacaferi.spark.ProcessUsersXml - -master local com.lenielmacaferi.spark- backend-0.0.1-SNAPSHOT.jar
    • The application [spark-frontend] was built with Play Framework (The High Velocity Web Framework For Java and Scala).
    • The user opens spark-frontend main page at localhost:9000 and has access to the list of CSV files generated by [spark-backend] application.
    • When clicking a file name, the CSV file is sent to the user's computer. The user can then use any spreadsheet software to open and post-process\analyse\massage the data.


  6. Conclusion
    • With Spark's help we can develop interesting solutions as for example: a daily job that can download and upload data to a folder "input" processing the data along the way in many different ways.
    • Using custom made code we can work with the data in a cluster (fast processing) using a rich API full of methods and resources. In addition, we have at our disposal inumerous additional libraries\plugins developed by the developer community. Put together all the power of Scala and Java and their accompanying libraries.
    • The application demonstrated can be easily executed in a cluster. We only need to change some parameters in the object SparkConf.

  7. Links

LocalDB cannot open database requested by login. Login failed.

Today I was working with a database using LocalDB server and I had set this specific database named CBA as the default one for my user LENIEL-PC\Leniel. Setting a default database for the user is useful because when you login using SQL Server Management Studio [ SSMS ] you go directly to that database, that is, when running queries that’s the database used and not the master one. This is life saver.

I wanted to restore a backup file .bak I got from the production server using the Restore Database… command in SSMS. This is to test locally but first I had to drop my local database copy. I did drop my local copy while I was logged in to that database. I Selected the database in Object Explorer, right clicked it and selected Delete. Then checked the Close existing connections (bottom of the window), clicked OK and then the deletion succeeded. Pretty good… no.

What happened? The next time I tried to login again in SSMS this was what I got:

SSMS | Cannot connect to (localdb)\v11.0.Figure 1 - SSMS | Cannot connect to (localdb)\v11.0.
Cannot open database requested by login. The login failed.
Login failed for user LENIEL-PC\Leniel (Microsoft SQL Server, Error: 4060)

Pretty obvious! I deleted the default database I had set to my user LENIEL-PC\Leniel. Now what? How can I login to the database server and change the default database? Good question… Smile

This “simple” error was such a pain to fix…I tried lots of things like using SqlLocalDB.exe utility to stop and delete that specific LocalDB instance named v11.0. I would create it again afterwards if deleting it worked at least. This was the message logged in Windows Event logs when I tried to delete the instance:

Windows API call LogSqlDiagRec returned error code: 0. Windows system error message is: The operation completed successfully.
Reported at line: 3791. [Microsoft][ODBC SQL Server Driver][SQL Server]Cannot open database "CBA" requested by the login. The login failed.

As you see the problem was the same. It was barking about the deleted database. Damn it!

I also tried the login with the sa account but guess what… the sa account was disabled for that LocalDB instance. Even if it was enabled I would not be able to login because I didn’t know the password for sa.

Googling more I managed to find this post by Pinal Dave:

SQL SERVER – Fix : Error: 4064 – Cannot open user default database. Login failed. Login failed for user

So guess what: the solution is ridiculously easy once you know what to do of course…

Click that Options >> button in Figure 1. Now select the Connection Properties tab.

SSMS Connect to Server | Connection Properties | Connect to database optionFigure 2 - SSMS Connect to Server | Connection Properties | Connect to database option

I had to type master in Connect to database field since I did not have it in the list of available databases.

Now click connect and you’re done.

Great. Thanks God I was able to login and access the database server.

What’s left?

Reset my user Default database to master so that I can login again directly. By the way, the field Default database was empty since I had dropped the database!

Doing so there’s no need to set the Connect to database option anymore.

SSMS | Setting Default database for userFigure 3 - SSMS | Setting Default database for user

Sometimes things are so easy but they are hidden somewhere we just can’t figure it out at first.

Hope it helps.

RavenDB Embedded with Management Studio UI

Go directly to solution with no bla bla bla…

I’ve been playing with RavenDB (a NoSQL document-oriented database) in an ASP.NET MVC 4 project for the past week. One thing I tried to do was to access RavenDB Management Studio UI so that I could see what’s actually present within the document store. This is important because one needs to check if docs are really being inserted, related docs are being deleted, etc…

Given that I’m running the embedded version of RavenDB (RavenDB-Embedded.1.0.499 package installed via NuGet in Visual Studio 2010), I was stuck trying to access the management studio since there isn’t much documentation on this subject when it comes to the EmbeddableDocumentStore. After struggling with it for about an hour of Googling and try and error, I decided to post a question at StackOverflow: Running RavenDB as an EmbeddableDocumentStore and accessing RavenDB Management Studio. Then I took a break to have launch and took a nap. After that I got back here to try a different approach and it really does work. Of course this is only a way to achieve what I want. This may not be the best approach but it’s enough. Just follow theses steps:

1 - Grab RavenDB latest build here:
http://builds.hibernatingrhinos.com/downloadlatest/ravendb

2 - Extract the files to C:\RavenDB-Build-499

3 - Edit the .config file in C:\RavenDB-Build-499\Server\Raven.Server.exe.config to point to your embedded database:

<appSettings>
   
<add key="Raven/Port" value="8088"/>
   
<add key="Raven/DataDir" value="C:\MyProject\trunk\MyProject\
App_Data\Database"
/>
   
<add key="Raven/AnonymousAccess" value="Get"/>
</appSettings>

4 - Click the Start.cmd present in the root folder C:\RavenDB-Build-499\Start.cmd

The server status output window should appear while it starts:

RavenDB server status windowFigure 1 - RavenDB server status window

When the server finishes its starting process, the Silverlight Management UI should be automatically opened in your preferred browser.

RavenDB Management UI (Web UI)Figure 2 - RavenDB Management Studio UI (Web UI)

Now I can see my docs, indexes, etc… and I hope you can too! :D

Note to self
According to John Allers, one should be able to access the Management Studio without having to start the server manually. That’s fine and I had already tried that, but I could not get it working at first (some days ago). This has led me to try everything else today and my last resort was posting a question at StackOverflow. After trying once more the same procedure, that is, trying to access the management studio using the URL http://localhost:8080, I finally got it working! Go figure. One possibility is that I had another service running on port 8080 when I first attempted to access the UI. As Windows has restarted since then, that service (Hudson probably) that was running on port 8080 is stopped and now everything just works as expected.

Things to do:

1 - Instantiate your EmbeddableDocumentStore this way:

_documentStore = new EmbeddableDocumentStore
            {
                ConnectionStringName = "YourDbName",
                UseEmbeddedHttpServer = true
            };

2 - Copy Raven.Studio.xap present in C:\RavenDB-Build-499\Server\ folder to the root folder of your web project

3 - Run you your web app

4 - Access http://localhost:8080 and voila… everything SHOULD work out of the box.

5 - Select Default Database:

RavenDB Management Studio accessed without running the server manuallyFigure 3 - RavenDB Management Studio accessed without running the server manually

Resources
Embedding RavenDB into an ASP.NET MVC 3 Application

SQL UPDATE statement with SELECT and SQL Server Image data type column

In a SQL Server database I have a table called users which has a column named signature. This column is of type Image.

My production SQL Server is located in a shared hosting environment.

One problem I’ve been facing lately is that I need permission to execute an UPDATE statement to insert a signature image for a given user. This problem occurs because to insert a signature image for a given user I have to execute a BULK statement like this for example:

--Update existing user
UPDATE users SET [signature] =(SELECT MyImage.* from Openrowset
Bulk 'C:\MyProject\trunk\MyCompany.Product\MyCompany.Product.Web\
Images\Signature.jpg'
, Single_Blob) MyImage) where Id
= '1111aaaa-1111-11aa-a111-111111a1a1a1'

The query above works fine in my local machine but when I tried to execute it on the remote/production server, I got this beautiful message:You do not have permission to use the bulk load statement.

In the shared hosting environment the execution of Bulk command is disabled by default for security reasons. This is annoying but totally understandable!

Using Bulk load - allows the user to populate a database from a file. It’s not available in shared environment because it is necessary to insert client files on SQL server locally (the production server).

So… great! I need a way to bypass this limitation because I won’t spend tubes of money paying a dedicated server… it’d make sense if and only if I needed a dedicated server.

1st try: import a specific user row from my local SQL Server to the remote instance using SQL Server Management Studio Import task. I got an error about constraint key violation because I already had the same row (for that user) on both databases. It’s just a matter of updating the signature column in the production database. This seemed to be a pain path.

2nd try: consider a dedicated server? Smiley pensativo No thanks… hehehe

3rd try and solution: a few days later I found myself thinking about this problem again (this signature column updating thing is a recurring task) and so I decided to find another way and it came to light - link the remote server to my local SQL Server Express instance and write a beautiful SQL query that does the job.

First I stopped in this excellent blog post with a step by step guide written by jen: Create Linked Server SQL Server 2008
This post provided everything I needed to link both SQL Server instances.

To make sure you have linked your server correctly, you can execute this query in your local server:

select server_id, name, product, provider, data_source, [catalog], is_linked
from sys.servers

The above query gave me this result:

Linked servers linked to my local SQL Server Express instance
Figure 1 - Linked servers linked to my local SQL Server Express instance

Then I Googled about Update with Select + SQL Server or something like that and found this StackOverflow question: Updating a table with multiple values from a select statement where the date matches. Lieven’s answer helped. I just had to adapt it to my case. This is the SQL code that does the dirty work:

UPDATE  U
SET     U.[signature] = users.[signature]
FROM    [LOCAWEB].[laudotech].[dbo].[users] U INNER JOIN users
ON users.id = U.id AND U.id = '1234aaaa-5678-90aa-b123-456789a0a1a2'

The above query must be executed within the context of the local SQL Server instance of course. Where the linked server resides.

To give you a view… this is how all this is configured inside SQL Server Management Studio (SSMS):

SSMS Object Explorer and the Linked Server LOCAWEB in my local SQL Server Express instanceFigure 2 - SSMS Object Explorer and the Linked Server LOCAWEB in my local SQL Server Express instance

There’s so many things one can do with SSMS that I feel really happy in learning one more of those things. Last week I blogged about Import/Export SQL Server database data with SSMS. Take a look at it.

Man! Have I said that I Iike working with databases!?

Hope it helps.

Import/Export SQL Server database data with SSMS

This post is a visual guide about one of the most powerful features of Microsoft SQL Server Management Studio (SSMS) 2008: Import/Export database data.

These tasks are useful for example when you need to debug an application with real world/production data to find possible bugs in your code. You can import the production database data to your local development database and find what’s causing problems. These tasks can be used to create a protection layer because you won’t be working “directly” with production data.

As you see, if I right click a database in SSMS I get access to this rich menu full of interesting options that deserve a careful study:

SSMS 2008 Import Export Database Tasks

Looking at the above picture you see in Object Explorer that I’m connected to two SQL database servers. The first one is a remote SQL Server (version 10.0.2775) and the other one is my local SQL Server Express (version 10.50.1617) instance.

With this configuration I have the perfect scenario to visually show you how the Import/Export Data… menu options work.

In this blog post I’ll execute the Export Data task. This allows me to test my web application using my local SQL Server instance with real world/production data.
The Import Data… task does practically the same thing but in reverse, logically. I’ll let the execution of the Import task to you dear reader!

OK… If I proceed and select the Export Data… menu option I’ll be able to bring new data to the database that is located on my machine. Data comes from the remote/production server to my local copy of the same database.

After clicking Export Data…, SQL Server Import and Export Wizard pops up. I then select the Data source/Server name (from which Data source type and Server I want to get the data to be imported). In this case, the data comes from the remote SQL server. I use SQL Server Authentication, fill the user name and password and select the Database that’ll provide the data. In Server name field I used the IP address of my remote server.

SQL Server Import and Export Wizard - Choose a Data Source

Now is the time to choose the Destination. In this case it’s my local SQL Server instance. LENIEL-PC is my Windows 7 machine name. SQLEXPRESS is the default SQL Express server name. I also selected the Database I want to receive the imported data. As you see both databases (source and destination) have the same name to make things easy.

SQL Server Import and Export Wizard - Choose a Destination

The wizard then asks if I want to copy data directly from table to table or if I want to write a custom SQL query to shape the data according to specific needs. I’ll select the first option to keep things simple.

SQL Server Import and Export Wizard - Specify Table Copy or Query

In the next screen we get a beautiful list of available tables and views. I’ve selected two tables that I want to copy.

SQL Server Import and Export Wizard - Select Source Table and Views

If you click the Edit Mappings… button (which applies to each table selected in the grid), I can also specify if I want to delete rows before importing data or if I want to append rows to the destination table. I’ll append rows because I have already erased both tables (exams and files) on my local database.

Apontando para cima Table files is linked to table exams through a foreign key. I disabled this foreign key constraint in my local database to avoid conflicts/errors during the import process. Beware because conflicts do appear most of the time. Conflicts are generally related to key violation/duplication.

SQL Server Import and Export Wizard - Column Mappings

As I’m using the free SQL Server Express Edition, the following screen has an interesting information:

In SQL Server Express, Web, or Workgroup, you can run the package that the Import and Export Wizard creates, but cannot save it. To save packages that the wizard creates, you must upgrade to SQL Server Standard, Enterprise, Developer or Evaluation.

This is telling me that I’ll have to go over the wizard (#7 steps) again if I happen to execute this very import process. Really annoying… Irritado … but that’s the price you must pay when using free tools.

SQL Server Import and Export Wizard - Run Package

Next button once more (the last one hallelujah) and we’re ready to go. It’s just a matter of reviewing what’ll be done and pressing the finish button.

SQL Server Import and Export Wizard - Complete the Wizard

After clicking Finish I got this last window:

SQL Server Import and Export Wizard - Execution Successful

Hope this simple and visual guide helps you understand how the data Import and Export processes work when dealing with a SQL Server database inside SSMS.

Free tools
Microsoft SQL Server Express Edition
Microsoft SQL Server 2008 Management Studio Express (SSMSE)

Oracle NLS_LANG for language/territory/character set

Recently I had an Oracle problem at work.
I kept postponing to look with care at this problem till today.

Problem
All accented characters as Ć , Ć”, Ć©, Ć­, Ć³, Ćŗ, Ć¢, ĆŖ, Ć®, Ć“, Ć», Ć£, Ƶ, etc and other ones as Ƨ (cedilla), Āŗ, ĀŖ (ordinal indicators), were being replaced by erroneous characters as this example in the header of a stored procedure:

=================================================================
-- Revis¿o : 'N¡Ć£ 0
-- Descri¿¿o: // description here
-- Data   : 03/29/2010
-- Autor  : CHEMTECH
=================================================================

This problem gets worse if you have something as

SELECT CONTRACT.NUM_CONTRACT,
           'N¡Ć£' || TO_CHAR(CONTRACT.NUM_CONTRACT) || ' - ' ||

The result of the above SQL query is used in the UI, that is, the user will see the wrong characters ¡Ć£ instead of Āŗ. If you commit this thing in the repository you mess everything up. You see what I mean…

Questioning
At first I thought the problem was the database encoding settings.

I and only I was getting crazy characters. My development group has more 3 people and no one was having this problem. This clearly should be a UI configuration, that is, how the data is shown on my machine.

A little bit of research on the net took me to Oracle’s NLS_LANG FAQ.

This is what NLS_LANG stands for/does:

A locale is a set of information addressing linguistic and cultural requirements that corresponds to a given language and country. Traditionally, the data associated with a locale provides support for formatting and parsing of dates, times, numbers, and currencies, etc. Providing current and correct locale data has historically been the responsibility of each platform owner or vendor, leading to inconsistencies and errors in locale data.

Setting the NLS_LANG environment parameter is the simplest way to specify locale behavior for Oracle software. It sets the language and territory used by the client application and the database server. It also indicates the client's character set, which corresponds to the character set for data to be entered or displayed by a client program.

I started reading the FAQ (top bottom approach) and tried the following query:

SELECT * FROM NLS_SESSION_PARAMETERS;

which gave me this result:

NLS_SESSION_PARAMETERS on my development machine
Figure 1 - NLS_SESSION_PARAMETERS on my development machine

As you see NLS_LANGUAGE and NLS_TERRITORY are set to AMERICAN and AMERICA respectively on my machine.

I asked Thiago Arakaki a coworker of mine to execute the same SQL query on his dev box. To my surprise and delight this was the screen he got:

NLS_SESSION_PARAMETERS on Thiago's development machine
Figure 2 - NLS_SESSION_PARAMETERS on Thiago Arakaki’s development machine 

As you see NLS_LANGUAGE and NLS_TERRITORY are set to BRAZILIAN PORTUGUESE and BRAZIL respectively on his machine.

This clearly pointed what could be the cause of the problem I was having.

I couldn’t understand why my machine had a different NLS_LANGUAGE and NLS_TERRITORY.

I read the FAQ a little bit more and got to this part Where to set the NLS_LANG in Windows.

I checked Windows registry to make sure the NLS_LANG subkey was set correctly on Oracle home. I got this:

NLS_LANG subkey on Windows RegistryFigure 3 - NLS_LANG subkey on Windows Registry

For my surprise it was correct. Look at NLS_LANG. It has BRAZILIAN PORTUGUESE_BRAZIL.WE8MSWIN1252 just like Thiago Arakaki’s dev box.

So why do I still was getting wrong characters inside Allround Automations PL/SQL Developer when working on the same database that Thiago also uses? Aha, that’s the question this post tries to clarify if you come to get in this same situation.

Reading the FAQ a little bit more… :o) I saw that you could also set NLS_LANG as a System or User Environment Variable, in System properties. So I went there to check if I already had such a thingy set on my machine. Again for my surprise this thingy was there. I don’t remember if/when I created this variable.

The only thing I thought at the moment was the interference that some previous installations of Oracle could’ve caused on my machine. Today I’m using Oracle express but in the past I installed the full Oracle server. I can’t state for sure what created the NLS_LANG var on my Windows System variables.

The screen I got was something like this:

 NLS_LANG variable on Windows System variables
Figure 4 - NLS_LANG variable on Windows System variables

I had AMERICAN set for the NLS_LANG variable.

Solution
I then changed the system variable NLS_LANG to BRAZILIAN PORTUGUESE_BRAZIL.WE8MSWIN1252. Clicked OK. Closed PL/SQL. Opened PL/SQL and issued the SQL query described on the beginning of this post and could see that the NLS_LANGUAGE and NLS_TERRITORY rows had changed appropriately to BRAZILIAN PORTUGUESE and BRAZIL respectively. After that I started to get the right characters throughout the database.

Great! Problem solved.

Upload/Import Download/Export a File to/from Oracle DB

What motivated me to write this post is that this is a really interesting task.

The general idea one have about databases is that they only serve the purpose of storing plain values as numbers, dates and text. A database does not do just that but even more allowing you to store a whole file as an Excel spreadsheet or an MP3 file in a column you define in your table.

To illustrate that, let’s suppose you were assigned the task of saving a spreadsheet in the database. This spreadsheet contains valuable data (as taxes values for each federal state) that’ll be used throughout your application. A user will upload that spreadsheet to the database and when someone else want to see the content of that spreadsheet it’s just a matter of clicking a download button to get it back. Pretty simple, isn’t it?

To accomplish that I’ll use Oracle database, more specifically Oracle Database 10g Express Edition (free to use).

The SQL code I used to create the database objects is as follows:

----------------------------------------------------------
-- Export file for user ORACLETEST                      --
-- Created by Leniel Macaferi on 3/13/2010, 11:43:17 PM --
----------------------------------------------------------

-- Create database tablespace
create tablespace oracletest datafile 'C:\oracletest.dbf'
size                                  20M
autoextend on maxsize                 unlimited
extent management local uniform size  64K;

-- Create user and Grant permissions
create user oracletest identified by oracletest default tablespace oracletest;
grant connect, resource to oracletest;
grant create session to oracletest;
grant unlimited tablespace to oracletest;
grant dba to oracletest;

-- Create table
create table FILES
(
  ID          NUMBER not null,
  UPLOAD_DATE DATE not null,
  CONTENT     BLOB not null,
  TYPE        VARCHAR2(64) not null,
  NAME        VARCHAR2(128) not null,
LENGTH NUMBER not null
);
alter table oracletest.FILES
  add constraint ID primary key (ID);

-- Create sequence
create sequence oracletest.SEQ_FILES
minvalue 1
maxvalue 999999999999999999999999999
start with 1
increment by 1
nocache;
-- Create import/upload procedure
create or replace procedure sp_import_file(
upload_date
in varchar2, content files.content%type, type in varchar2, name in varchar2,
length in number, id
out number) is begin select seq_files.nextval into id from dual; insert into files (id, upload_date, content, type, name, length) values (id, to_date(upload_date, 'mm/dd/yyyy hh:mi:ss pm'), content, type, name,
length
); end sp_import_file;

I’ll use an ASP.NET MVC application to show the process of uploading/importing downloading/exporting a file to/from an Oracle database.

The app has a tab called Import/Export File. Clicking on this tab takes you to a View that contains a file upload control and two buttons. When the user clicks the Import button the file will be uploaded/imported/saved to/in the database. When the user clicks the Export button the last file saved in the database will be downloaded/exported to the user’s computer.

The following is a screenshot of the application:

Oracle Import Export File ASP.NET MVC Application
Figure 1 - ASP.NET MVC Application UI

The following is a screenshot of the Files table which already has some files:

Oracle Import Export Files Table
Figure 2 - Oracle Files Table

As you can see the thing that makes it possible to import a whole file to a table is the type of the column that’ll hold the file; in this case the type is BLOB.

In the lines that follow I’ll present the source code I used to implement this sample ASP.NET MVC application.

It’s always good to view the Solution Explorer inside Visual Studio so that I can highlight the files I have created:

Oracle Import Export File Visual Studio Solution Explorer Figure 3 - Solution Explorer in Visual Studio

This is the C# code used in the upload/import process (inside OracleFileModel.cs):

/// <summary>
/// Imports any file to an Oracle database table that has a column of type BLOB.
/// </summary>
/// <param name="uploadDate">Date of upload/importation</param>
/// <param name="fileName">File name</param>
/// <param name="filePath">File path on the server</param>
/// <param name="fileType">File type</param>
/// <param name="fileSize">File size in bytes</param> /// <returns>Id of Row just inserted</returns> public static int Import(string uploadDate, string fileName, string filePath, string fileType, int fileSize) { OracleConnection connection = GetOracleConnection(); connection.Open(); OracleTransaction transaction; transaction = connection.BeginTransaction(); OracleCommand cmd; cmd = connection.CreateCommand(); cmd.Transaction = transaction; // Creates a temporary blob object on the database. This object will store the file content. cmd.CommandText = "declare xx blob; begin dbms_lob.createtemporary(xx, false, 0); :tempblob := xx; end;"; cmd.Parameters.Add(new OracleParameter("tempblob", OracleType.Blob)).Direction = ParameterDirection.Output; cmd.ExecuteNonQuery(); // Getting the content of the file... byte[] buffer = GetFileContent(filePath); // Oracle object responsible for storing the File content. OracleLob tempLob; // Assigning tempLob the blob object created on the database. tempLob = (OracleLob)cmd.Parameters[0].Value; tempLob.BeginBatch(OracleLobOpenMode.ReadWrite); // Writing the file content to tempLob. tempLob.Write(buffer, 0, buffer.Length); tempLob.EndBatch(); cmd.Parameters.Clear(); // The name of the Procedure responsible for inserting the data in the table. cmd.CommandText = "SP_IMPORT_FILE"; cmd.CommandType = CommandType.StoredProcedure; cmd.Parameters.Add(new OracleParameter("upload_date", OracleType.VarChar)).Value = uploadDate; cmd.Parameters.Add(new OracleParameter("content", OracleType.Blob)).Value = tempLob; cmd.Parameters.Add(new OracleParameter("type", OracleType.VarChar)).Value = fileType; cmd.Parameters.Add(new OracleParameter("name", OracleType.VarChar)).Value = fileName;
cmd.Parameters.Add(new OracleParameter("length", OracleType.Number)).Value = fileSize; cmd.Parameters.Add(new OracleParameter("id", OracleType.Number)).Direction = ParameterDirection.Output; try { cmd.ExecuteNonQuery(); } catch (Exception ex) { transaction.Rollback(); throw new Exception(ex.Message); } transaction.Commit(); connection.Close(); // Returning the Id of the row just inserted in table FILES. // This Id could be used to associate the file inserted with another table, for example, // if you had to parse the content of a spreadsheet and save each line in other table. return int.Parse(cmd.Parameters[4].Value.ToString()); }
This is the C# code used in the download/export process (inside OracleFileModel.cs)::
/// <summary>
/// Exports the last file imported to an Oracle database table that has a column of type BLOB.
/// </summary>
/// <returns>File</returns>
public static MyFileModel Export()
{
    MyFileModel file = new MyFileModel();

    OracleConnection connection = GetOracleConnection();
    connection.Open();

    OracleCommand cmd;
    cmd = connection.CreateCommand();

    // Gets the last File imported.
    cmd.CommandText = "select content, name, type from (select content, name, type from files f order by f.upload_date desc) where rownum = 1";

    byte[] fileContent;

    try
    {
        OracleDataReader reader = cmd.ExecuteReader();

        using (reader)
        {
            reader.Read();

            if (!reader.HasRows)
            {
                return file;
            }

            OracleLob blob = reader.GetOracleLob(0);
            OracleString fileName = reader.GetOracleString(1);
            OracleString fileType = reader.GetOracleString(2);

            fileContent = new byte[blob.Length];

            blob.Read(fileContent, 0, (int)blob.Length);

            file.Content = fileContent;
            file.Name = fileName.Value;
            file.Type = fileType.Value;
        }
    }
    catch (Exception ex)
    {
        throw new Exception(ex.Message);
    }

    connection.Close();

    return file;
}
/// <summary>
/// Gets an Oracle connection to the database.
/// </summary>
/// <returns>OracleConnection</returns>
private static OracleConnection GetOracleConnection()
{
    string user = ConfigurationManager.AppSettings["user"];
    string password = ConfigurationManager.AppSettings["password"];
    string dataSource = ConfigurationManager.AppSettings["dataSource"];

   return new OracleConnection("Data Source=" + dataSource + ";User Id=" + user + ";Password=" + password + ";");
}

/// <summary>
/// Get the content of the file in byte array buffer.
/// </summary>
/// <param name="path">File path on the server</param>
/// <returns>File content</returns>
private static byte[] GetFileContent(string path)
{
    Stream fs = File.OpenRead(path);
    
    byte[] buffer = new byte[fs.Length];

    int q = fs.Read(buffer, 0, Convert.ToInt32(fs.Length));

    fs.Close();

    return buffer;
}

This is the code implemented for the OracleFileController:

public class OracleFileController : Controller
{
    //
    // GET: /OracleFile/

    public ActionResult OracleFile(FormCollection formCollection)
    {
// Handles the button that posted to the server...
// OracleFile.aspx has 2 submit buttons.


if (formCollection["Import"] != null) { Import(); } else if (formCollection["Export"] != null) { return Export(); } return View(); } /// <summary> /// Imports a file from the user computer to the database. /// </summary> /// <returns></returns> private void Import() { foreach (string fileName in Request.Files) {
// Gets the file the user selected. HttpPostedFileBase file = Request.Files[fileName];
if (file.ContentLength > 0) {
// Getting the name of the folder in which the file will be saved on the server. string saveFolder = ConfigurationManager.AppSettings["saveFolder"];
                // Creating the path for the file on the server.

string filePath = Path.Combine(Request.ServerVariables["APPL_PHYSICAL_PATH"] + saveFolder, Path.GetFileName(file.FileName));
                // Saving a copy of the user's posted file on the server.
file.SaveAs(filePath);
                // Save the file in the database.
                // Although id isn't being used, this shows you how to get the id of the file just inserted in the database.
                int id = OracleFileModel.Import(DateTime.Now.ToString(), file.FileName, filePath, file.ContentType, file.ContentLength);
                // Deleting the file just imported so that the server disk does not get full.
                System.IO.File.Delete(filePath);

                TempData["message"] = "File imported with sucess.";
            }
        }
    }

    /// <summary>
    /// Exports a file from the database to the user's computer.
    /// </summary>
    /// <returns></returns>
    public ActionResult Export()
    {
        // Get the file from the database.
        MyFileModel file = OracleFileModel.Export();

        if (file.Content != null)
        {
            TempData["message"] = "File exported with success.";

            // Return the file to the user's computer.
            return File(file.Content, file.Type, file.Name);
        }
        else
        {
            TempData["message"] = "There's nothing to download.";

            return View("OracleFile");
        }
    }
}

This is a really big post full of code. I think the code has sufficient comments that should shed some light on how to accomplish the task.

Important note
All properties read in the form of ConfigurationManager.AppSettings["property"]; including the database name and password are stored in the Web.config file under the appSettings section.

<appSettings>
   <add key="user" value="oracletest"/>
   <add key="password" value="oracletest"/>
   <add key="dataSource" value="localhost"/>
   <add key="saveFolder" value="Files"/>
</appSettings>

Hope you make good use of it.

Visual Studio 2010 C# ASP.NET MVC Application
You can get the Microsoft Visual Studio Project at:

http://sites.google.com/site/leniel/blog/OracleImportExportFile.zip

To try out the code you can use the free Microsoft Visual Studio 2010 Professional Release Candidate (ISO file) that you can get at: http://www.microsoft.com/downloads/details.aspx?FamilyID=f9c0b89b-4964-4906-94c6-60ad8a429690&displaylang=en

Although I’ve used the 2010 version of Visual Studio I think the code presented in this post works just fine with previous versions of Visual Studio and ASP.NET MVC framework. You shouldn’t have problems while copying/pasting the code in previous versions. Maybe you’ll have to change DLL references or something like that.

Logging NHibernate SQL with log4net in ASP.NET

Have you ever wondered how to log the SQL generated by NHibernate?

This post tries to exemplify just that.

NHibernate uses HQL to leverage its expressiveness to the developer, but behind the scenes there is an engine that transforms the HQL into pure SQL that is executed against the database. This SQL can be logged so that you can see its structure and get a snapshot of what the database engine receives.

log4net is a logging tool that helps the developer see what SQL NHibernate is generating under the covers.

This is a brief description of log4net taken from its homepage:

log4net is a tool to help the programmer output log statements to a variety of output targets.

First and foremost you need to enable NHibernate logging in its configuration file. The property that sets this is hibernate.show_sql.

<add key="hibernate.show_sql" value="true" />

The following piece of code shows how to configure an appender and a logger that makes use of the appender. This code is kept inside the Web.config file in the log4net configuration section:

<appender name="NHibernateRollingFileAppender" type="log4net.Appender.RollingFileAppender">
    <file value="LogNHibernate.txt"/>
    <appendToFile value="true"/>
    <rollingStyle value="Size"/>
    <datePattern value="yyyyMMdd"/>
    <maxSizeRollBackups value="10"/>
    <maximumFileSize value="10MB"/>
    <layout type="log4net.Layout.PatternLayout">
        <conversionPattern value="%date - %message%newline"/>
    </layout>
</appender>

<logger name="NHibernateLogger" additivity="false">
    <level value="DEBUG"/> <!-- ALL, DEBUG, INFO, WARN, ERROR, FATAL or OFF -->
<appender-ref ref="NHibernateRollingFileAppender"/> </logger>

I’ll describe each part of the above code.

<appender name="NHibernateRollingFileAppender" type="log4net.Appender.RollingFileAppender">

Appender is an output destination. In this case its a RollingFileAppender. It writes logging events to a file in the file system.

<file value="LogNHibernate.txt"/>

The file property specifies the name of the file that’ll store the logs.

<appendToFile value="true"/>

The appendToFile property is set to true so that the appender will overwrite existing files.

<rollingStyle value="Size"/>

The rollingStyle property set how to roll log files. In this case the appender will roll log files based on the file size.

<datePattern value="yyyyMMdd"/>

To change the rolling period we need to set the datePattern property. It would be used if we adopted a rollingStyle based on Date instead of Size.

<maxSizeRollBackups value="10"/>
<
maximumFileSize value="10MB"
/>

Up to 10 old files of 10MB each will be kept. These rolled files will be named: LogNHibernate.txt.1, LogNHibernate.txt.2, LogNHibernate.txt.3, etc...

<layout type="log4net.Layout.PatternLayout">
<conversionPattern value="%date - %message%newline"
/>
</
layout
>

A layout enables us to customize the output format. This is accomplished by associating a layout with an appender.

The PatternLayout, lets the user specify the output format according to conversion patterns similar to the C language printf function.

<logger name="NHibernateLogger" additivity="false">

This is the logger and its additivity property controls appender accumulation, that is, how the logs are printed in the hierarchy of loggers.

For example, the output of a log statement of logger NHibernateLogger will go to all the appenders in NHibernateLogger and its ancestors. This is the meaning of the term "appender additivity".

<level value="DEBUG"/>

The level property controls the amount of information you want to be written to the log.

<appender-ref ref="NHibernateRollingFileAppender"/>

The property appender-ref specifies what appenders this logger uses.

That’s it! :-)

With this basic configuration you can start you search for points where NHibernate is generating to much queries where it shouldn’t.

I’m currently working on a performance branch where I’m learning how to deal with NHibernate lazy configuration.

This process of logging the SQL generated by NHibernate plays a great role when one is solving the bottlenecks involved in performance implications.

Just one note: keep in mind that the process of logging is by itself an onerous one. The amount of data that gets written by NHibernate is expressive and depending on the level of information you set inside the logger, the file size will grow very fast.

Hope this helps the fellow developers.

References

log4net homepage
http://logging.apache.org/log4net

log4net introduction
http://logging.apache.org/log4net/release/manual/introduction.html

log4net configuration examples
http://logging.apache.org/log4net/release/config-examples.html