Parsing strings with RegEx

Last post 05-12-2008 12:01 PM by gbogea. 13 replies.

Sort Posts:

  • Parsing strings with RegEx

    05-09-2008, 10:06 AM
    • Loading...
    • kirilminev
    • Joined on 06-12-2007, 12:20 PM
    • Posts 13

    I am trying to parse the following string coming as an input from a text file:

    Eurex  App: X_RISK     Trader: TTADM  XXX 100     UL: TTADM Workstation: xx.x.x.xx Server: xx.xx.xx.xx
    LIFFE  App: X_RISK     Trader: TTADM  XXX 100     UL: TTADM Workstation: xx.x.X.xx Server: xx.xx.xx.xx

    I want to be able to extract the following values and put them into an array of datarows.  

    Eurex    X_RISK    TTADM  XXX 100    TTADM    xx.x.x.xx    xx.xx.xx.xx
    Eurex    X_RISK    TTADM  XXX 100    TTADM    xx.x.x.xx    xx.xx.xx.xx
    Eurex    X_RISK    TTADM  XXX 100    TTADM    xx.x.x.xx    xx.xx.xx.xx

    I got it to a point where I am almost there, but still not quite I am having some difficulties how to get rid of the text ending with the double columns ":" and keep only the value I need so I can place them in my data table.

    Please look at my code below, any help and suggestions are highly appreciated.

             public DataSet ConvertNew(string sLogFile)
            {
                if (!System.IO.File.Exists(sLogFile))
                    return null;//fail file not found

                stream = new StreamReader(sLogFile, true);
                //Create a new data set
                DataSet dsFile = new DataSet("File");

                DataTable dtTable = dsFile.Tables.Add("Data");

                dtTable.Columns.Add("Exchange", typeof(string));
                dtTable.Columns.Add("Application", typeof(string));
                dtTable.Columns.Add("Trader", typeof(string));
                dtTable.Columns.Add("UL", typeof(string));
                dtTable.Columns.Add("Workstation", typeof(string));
                dtTable.Columns.Add("Server", typeof(string));

                Regex r = new Regex(",(?=([^\"]*\"[^\"]*\")*(?![^\"]*\"))");
                Regex regExpr = new Regex("\\s\\S");
                Regex patternExpr = new Regex("(App:|Trader:|UL:|Workstation:|Server:|:)");

                // Get the first line of data from the input (log) file.
                val = NextLine(stream);

                //Strip out each field and insert in DataTable
                int sStart;//used as a placeholder for our current location
                int nCount;//used to determine which field / column we are currently extracting
                string sTemp;//used to temporarily hold the field pulled out by RegEx in order to trim the quotation marks from the ends of the fiel (if any)

                //Data row object used to set values
                DataRow drRow;

                while (val != "")//val is empty when the input stream has been read to the end
                {
                    sStart = 0;
                    nCount = 0;

                    //create a new row
                    drRow = dtTable.NewRow();

                    //Iterate through all the matches (match=field in current row)           
                    foreach (Match m in patternExpr.Matches(val))
                    {
                        //Retrieve the field based on the results of the match
                        sTemp = val.Substring(sStart, m.Index - sStart);

                        drRow[nCount] = sTemp;

                        //keep one step ahead in the matching game..

                        sStart = m.Index + 1;

                        //keep track of which field is next
                        nCount++;

                    }
                    dtTable.Rows.Add(drRow);

                    val = NextLine(stream);
                }

                return dsFile;

            }

     *****This function I use to simply read line by line from the text file:

     private static String NextLine(StreamReader stream)
            {
                int stemp = stream.Read();
                String sReturn = "";

                while (stemp != -1 && stemp != '\n')/
                {

                    sReturn += (char)stemp;
                    stemp = stream.Read();
                }
                return sReturn;
            }

  • Re: Parsing strings with RegEx

    05-09-2008, 10:43 AM
    • Loading...
    • gbogea
    • Joined on 04-14-2008, 11:17 PM
    • Brazil
    • Posts 204

    You can use RegEx labels to get the exact match you want. I created a demo for you that takes on line and prints it out to the console, I think this will solve your problem:

     

                string s = @"Eurex  App: X_RISK     Trader: TTADM  XXX 100     UL: TTADM Workstation: xx.x.x.xx Server: xx.xx.xx.xx";
                Regex r = new Regex(@"(?<Name>\w*)\s*App:\s*(?<App>\w*)\s*Trader:\s*(?<Trader>\w*\s*\w*\s*\w*)\s*UL:\s*(?<UL>\w*)\s*Workstation:\s*(?<Workstation>\w{2}\.\w\.\w\.\w{2})\s*Server:\s*(?<Server>\w{2}\.\w{2}\.\w{2}\.\w{2})");
                Match m = r.Match(s);
                Console.WriteLine(m.Groups["Name"].ToString());
                Console.WriteLine(m.Groups["App"].ToString());
                Console.WriteLine(m.Groups["Trader"].ToString());
                Console.WriteLine(m.Groups["UL"].ToString());
                Console.WriteLine(m.Groups["Workstation"].ToString());
                Console.WriteLine(m.Groups["Server"].ToString());
    
      
    Gabriel Bogéa (http://www.gbogea.com)
    -----------------
    Please 'Mark as Answer' the post(s) that helped you
  • Re: Parsing strings with RegEx

    05-09-2008, 11:48 AM
    • Loading...
    • kirilminev
    • Joined on 06-12-2007, 12:20 PM
    • Posts 13

    Thank you so much that solved my issues. That is my first time using regular expressions and it is kinda of hard to grasp from the begining.

     

    thanks 

  • Re: Parsing strings with RegEx

    05-09-2008, 12:05 PM
    • Loading...
    • gbogea
    • Joined on 04-14-2008, 11:17 PM
    • Brazil
    • Posts 204

    Yep, regular expressions are not easy to learn, but the effort pays off as you can see in your own example.

    I glad I could help. 

    Gabriel Bogéa (http://www.gbogea.com)
    -----------------
    Please 'Mark as Answer' the post(s) that helped you
  • Re: Parsing strings with RegEx

    05-09-2008, 12:30 PM
    • Loading...
    • kirilminev
    • Joined on 06-12-2007, 12:20 PM
    • Posts 13

     I have one more issue that I am still working on the above code it works, but if I have different input which is coming from a really big file things are getting off hand.

    I think it should be something to do with the RegEx string path I don't really know what I am missing.

     

    Here is the input that I am reading as a stream and then I am passing it to a DataTable in order to present it to a grid view and so forth:


     

    And that is my code:

     

     public DataSet ConvertNew(string sLogFile)
            {
                if (!System.IO.File.Exists(sLogFile))
                    return null;//fail file not found

                stream = new StreamReader(sLogFile, true);
                //Create a new data set
                DataSet dsFile = new DataSet("File");

                DataTable dtTable = dsFile.Tables.Add("Data");

                dtTable.Columns.Add("Exchange", typeof(string));
                dtTable.Columns.Add("Application", typeof(string));
                dtTable.Columns.Add("Trader", typeof(string));
                dtTable.Columns.Add("UL", typeof(string));
                dtTable.Columns.Add("Workstation", typeof(string));
                dtTable.Columns.Add("Server", typeof(string));


                Regex r = new Regex(@"(?<Name>\w*)\s*App:\s*(?<App>\w*)\s*Trader:\s*(?<Trader>\w*\s*\w*\s*\w*)\s*UL:\s*(?<UL>\w*)\s*Workstation:\s*(?<Workstation>\w{2}\.\w\.\w\.\w{2})\s*Server:\s*(?<Server>\w{2}\.\w{2}\.\w{2}\.\w{2})");

                // Get the first line of data from the input (log) file.
                val = NextLine(stream);
                //Data row object used to set values
                DataRow drRow;

                while (val != "")//val is empty when the input stream has been read to the end
                {
                    //create a new row
                    drRow = dtTable.NewRow();

                    foreach (Match m in r.Matches(val))
                    {
                        //Console.WriteLine(m.Groups["Name"].ToString());
                        //Console.WriteLine(m.Groups["App"].ToString());
                        //Console.WriteLine(m.Groups["Trader"].ToString());
                        //Console.WriteLine(m.Groups["UL"].ToString());
                        //Console.WriteLine(m.Groups["Workstation"].ToString());
                        //Console.WriteLine(m.Groups["Server"].ToString());

                        drRow["Exchange"] = m.Groups["Name"].ToString();
                        drRow["Application"] = m.Groups["App"].ToString();
                        drRow["Trader"] = m.Groups["Trader"].ToString();
                        drRow["UL"] = m.Groups["UL"].ToString();
                        drRow["Workstation"] = m.Groups["Workstation"].ToString();
                        drRow["Server"] = m.Groups["Server"].ToString();

                        dtTable.Rows.Add(drRow);
                    }
                    val = NextLine(stream);
                }
                return dsFile;
            }
     


    I am really stucked with this regex string path

  • Re: Parsing strings with RegEx

    05-09-2008, 12:33 PM
    • Loading...
    • kirilminev
    • Joined on 06-12-2007, 12:20 PM
    • Posts 13

    It starts parsing incorrectly  after the 5th line

  • Re: Parsing strings with RegEx

    05-09-2008, 12:47 PM
    • Loading...
    • kirilminev
    • Joined on 06-12-2007, 12:20 PM
    • Posts 13

     After troubleshooting I determined that as soon as it hits dash or underscore it stops working properly.

     

    How can I remove this problem I would really appreciate any input.

     

    Thanks in advance 

  • Re: Parsing strings with RegEx

    05-09-2008, 1:18 PM
    • Loading...
    • gbogea
    • Joined on 04-14-2008, 11:17 PM
    • Brazil
    • Posts 204

    \w only works for word characters.

    Replace it by [\w-].

    Like (?<Name>\w*) will become (?<Name>[\w-]*)

    This says to look for 0 or more occurrences of a word character or a dash(-).

    Let me know how it works. 

    Gabriel Bogéa (http://www.gbogea.com)
    -----------------
    Please 'Mark as Answer' the post(s) that helped you
  • Re: Parsing strings with RegEx

    05-09-2008, 2:04 PM
    • Loading...
    • kirilminev
    • Joined on 06-12-2007, 12:20 PM
    • Posts 13

  • Re: Parsing strings with RegEx

    05-09-2008, 3:27 PM
    • Loading...
    • gbogea
    • Joined on 04-14-2008, 11:17 PM
    • Brazil
    • Posts 204

    I'll take a look at it and post a little later.

    I have to account for the variations of pattern in your string.

     

    Gabriel Bogéa (http://www.gbogea.com)
    -----------------
    Please 'Mark as Answer' the post(s) that helped you
  • Re: Parsing strings with RegEx

    05-09-2008, 4:13 PM
    • Loading...
    • kirilminev
    • Joined on 06-12-2007, 12:20 PM
    • Posts 13

    Thanks Gabriel your help is very appreciated and helpful 

  • Re: Parsing strings with RegEx

    05-09-2008, 7:18 PM
    • Loading...
    • gbogea
    • Joined on 04-14-2008, 11:17 PM
    • Brazil
    • Posts 204

     Let's try it again. This time I divided the expression for you to understand better:

                string name = @"(?<Name>[\w-]*)\s*";
                string app = @"App:\s*(?<App>[\w-]*)\s*";
                string trader = @"Trader:\s*(?<Trader>([\w-]*\s)+)";
                string ul = @"UL:\s*(?<UL>[\w-]*)\s*";
                string workstation = @"Workstation:\s*(?<Workstation>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s*";
                string server = @"Server:\s*(?<Server>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})";
                Regex r = new Regex(name+app+trader+ul+workstation+server);

    Sorry but I had to do it in a hurry because I'm on my way out. If it doesn't work for any reason let me know and I'll correct it.

    Gabriel Bogéa (http://www.gbogea.com)
    -----------------
    Please 'Mark as Answer' the post(s) that helped you
  • Re: Parsing strings with RegEx

    05-12-2008, 11:43 AM
    • Loading...
    • kirilminev
    • Joined on 06-12-2007, 12:20 PM
    • Posts 13
    Thanks Garbriel that worked out pretty well. I really appreciate your help.
  • Re: Parsing strings with RegEx

    05-12-2008, 12:01 PM
    • Loading...
    • gbogea
    • Joined on 04-14-2008, 11:17 PM
    • Brazil
    • Posts 204

     No problem, I'm glad I could help.

    If you can please mark the post that resolved you problem as Answer.

    Thanks a lot. 

    Gabriel Bogéa (http://www.gbogea.com)
    -----------------
    Please 'Mark as Answer' the post(s) that helped you
Page 1 of 1 (14 items)