Home > 2010, SQL Server, Uncategorized > Requesting SSIS Help

Requesting SSIS Help

Data Import from log/text file (poorly formatted)

 

I have been given the task to import some data logs from our firewall system into SQL Server for reporting purposes and to help keep data longer (also for reporting) as the firewall system database needs to purge data more regularly.

 

The fun part is the firewall software uses PostgreSQL as its database and I have not found a way to export the data out and into SQL Server.  So that leaves me to try and import the daily logs that it creates. 

 

Sample data:

1120557876,,9087042812EBD,1006,1143243492,1,0,6,SMTP-Inbound-SlimeWarner-00,smtp/tcp,102.98.153.2,1163,72.192.41.10,25,0.0.0.0,0,0.0.0.0,0,Slime Warner,Trusted,590,0,0,,SMTP-Incoming.GUS.1,,,,header=Date: Sat\, 4 Sep 2010 11:30:01 +0430,rule_name=Date:*,,,,,tr,ProxyAllow: SMTP Header,,2010-09-04 00:00:10

 

I created a simple database on my SQL Server called Firewall and created a table similar to the table in the firewall software. (I know bad table)

 

USE [Firewall]

GO

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

CREATE TABLE [Firewall].[dbo].[cluster_traffic] (

[sid] NVARCHAR(50),

[cluster] NVARCHAR(50),

[sn] NVARCHAR(50),

[tag_id] NVARCHAR(50),

[raw_id] NVARCHAR(50),

[disp] NVARCHAR(50),

[direction] NVARCHAR(50),

[pri] NVARCHAR(50),

[policy] NVARCHAR(50),

[protocol] NVARCHAR(50),

[src_ip] NVARCHAR(50),

[src_port] NVARCHAR(50),

[dst_ip] NVARCHAR(50),

[dst_port] NVARCHAR(50),

[src_ip_nat] NVARCHAR(50),

[src_port_nat] NVARCHAR(50),

[dst_ip_nat] NVARCHAR(50),

[dst_port_nat] NVARCHAR(50),

[src_intf] NVARCHAR(50),

[dst_intf] NVARCHAR(50),

[rc] NVARCHAR(50),

[pckt_len] NVARCHAR(50),

[ttl] NVARCHAR(50),

[pr_info] NVARCHAR(50),

[proxy_act] NVARCHAR(50),

[alarm_name] NVARCHAR(50),

[alarm_type] NVARCHAR(50),

[alarm_id] NVARCHAR(50),

[info_1] NVARCHAR(100),

[info_2] NVARCHAR(100),

[info_3] NVARCHAR(100),

[info_4] NVARCHAR(100),

[info_5] NVARCHAR(100),

[info_6] NVARCHAR(100),

[log_type] NVARCHAR(50),

[msg] NVARCHAR(255),

[bucket] NVARCHAR(50),

[update_time] [datetime] NULL

)

 

The problem that I am running into is with the [info_] data field.  Some of the data has commas in it and this is causing problems when trying to import into SQL Server.

 

I initially used the SQL Server Import and Export Wizard in SSMS and this is how the data looks when I preview.  Some of the data is jumbled and moved over to the wrong column.  The highlighted shows the problem.

 

clip_image002

 

Question:

Is there a way to import this data correctly through SSIS with some sort of pre-cleanup?

 

 

 

Advertisements
Categories: 2010, SQL Server, Uncategorized
  1. Peter
    November 2, 2010 at 10:41 am

    Gotta love CSV files. Without getting some form of quotes around each field, it’s not really possible to pull this out 100%. Perhaps the field after it always contains a certain value or pattern against which you can search, but that’s unlikely in most cases.

    You may want to ask the manufacturer if there’s a way to change the format of the log file. Sometimes that’s a config setting. You may also want to ask if you can get a read-only login to the PostgreSQL database. I’m pretty sure that you can at least get an ODBC driver that would work against Postgres, if not an OLEDB provider. That would make your life significantly easier.

  2. @NULLgarity
    November 3, 2010 at 9:52 am

    You might try importing the entire CSV row into a single column and then getting “smart” with your parsing using T-SQL.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: