• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

Losing the plot on C# Regex

Collapse
X
  •  
  • Filter
  • Time
  • Show
Clear All
new posts

    Losing the plot on C# Regex

    Hi peeps,

    I wonder if any of you .NET guys could try and figure out why my regex works in several tools, but when put into C# it fails?

    PHP Code:
    string    input = @"<a href=""http://www.thisisthedomain.com/"" class='big one' title=""testing & fixing & failing"">text here - More Text - some more</a>";

    // Parse HTML for other attributes
    Regex regex = new Regex(@"(<a\s+)*title\s*=\s*(?:""|')(?<title>[^'""]*)(?:""|')|class\s*=\s*(?:""|')(?<class>[^'""]*)(?:""|')|href\s*=\s*(?:""|')(?<href>[^'""]*)(?:""|')|rel\s*=\s*(?:""|')(?<rel>[^'""]*)(?:""|')"RegexOptions.IgnoreCase);

    Match match _regex.Match(input);

    // Fetch named groups
    _rel match.Groups["rel"].Value;
    _href match.Groups["href"].Value;
    _title match.Groups["title"].Value;
    _class match.Groups["class"].Value
    Using RegExBuddy, Rad software Regular Expression Designer, Sells Brothers RegExDesigner.NET I can match the "title" group, but my C# fails to (although it gets the "href".

    Using 1.1 of the Framework.

    If anyone can see what's going wrong I'd be really chuffed.

    Cheers,

    DP

    #2
    this works

    Regex regex = new Regex(@"(<a\s+)*title\s*=\s*(?:""|')(?<title>[^'""]*)(?:""|')", RegexOptions.IgnoreCase);

    Comment


      #3
      I'm trying to match and capture all the main link attributes, href, title, rel, class in any order in any combination along with any other valid HTML attribute.

      Any ideas?

      Comment


        #4
        Solved - I think!

        Looks like I might have it by doing it in two chunks.

        Find all links in HTML:

        PHP Code:
        _regex = new Regex(@"(?<linkhtml><a[\s]+[^>]*?href[\s]?=[\s\""\']+(?<href>.*?)[\""\']+.*?>)(?<linktext>[^<]+|.*?)?<\/a>", RegexOptions.IgnoreCase | RegexOptions.Compiled |RegexOptions.Multiline); 
        Then stick the linkhtml group into a 2nd regex:

        PHP Code:
        _regxAttributes = new Regex("(?<name>\\b\\w+\\b)\\s*=\\s*(\"(?<value>[^\"]*)\"|'(?<value>[^']*)'|(?<value>\" + [^\"\"'<> \\s]+)\\s*)+",
                        
        RegexOptions.Singleline RegexOptions.IgnoreCase RegexOptions.Compiled); 
        To get all the attributes as name and value pairs.

        If anyone knows how to capture every hyperlink in the HTML and capture all the attributes in one step I'd be pleased to know.

        Cheers,

        DP

        Comment


          #5
          that's what i was suggesting. mercifully, life is far too short to spend time on the black art of reg exps...

          Comment


            #6
            Originally posted by scotspine
            that's what i was suggesting. mercifully, life is far too short to spend time on the black art of reg exps...
            Indeed - which is a good thing because I get paid a lot to maintain them after I've written them
            Serving religion with the contempt it deserves...

            Comment


              #7
              Originally posted by scotspine
              that's what i was suggesting. mercifully, life is far too short to spend time on the black art of reg exps...
              Indeed, and I'll second that, regex is definately a black art which you'd have to be a C programmer to fully understand.

              Comment


                #8
                Originally posted by Joe Black
                Indeed, and I'll second that, regex is definately a black art which you'd have to be a C programmer to fully understand.
                I'm a C guy too! I think there may be a pattern you have discovered. Something to do with obfuscatable languages... Hmm I'm also a Perl guy. This gets deeper.
                Serving religion with the contempt it deserves...

                Comment


                  #9
                  Originally posted by Joe Black
                  Indeed, and I'll second that, regex is definately a black art which you'd have to be a C programmer to fully understand.

                  Or a Perl Programmer, for when you really want to confuse people
                  "Being nice costs nothing and sometimes gets you extra bacon" - Pondlife.

                  Comment


                    #10
                    Originally posted by DaveB
                    Or a Perl Programmer, for when you really want to confuse people
                    Or postscript as well. The following program runs in a postscript and perl interpreter:

                    http://perl.plover.com/obfuscated/bestever.pl

                    Insane!
                    Serving religion with the contempt it deserves...

                    Comment

                    Working...
                    X