Finding content in a file without having to read every line

Welcome Forums General PowerShell Q&A Finding content in a file without having to read every line

Viewing 9 reply threads
  • Author
    Posts
    • #279069
      Hil
      Participant
      Topics: 34
      Replies: 46
      Points: 265
      Rank: Contributor

      When programing with Winbatch (winbatch.com) I could read the contents of a file into a Binary buffer. This way I would not need to read each line.  I know I could use the select-string command, but if I have multiple sections in the file and each section contains the string to be extracted, then select-string would not work as well. In such a case it would have been easy going to the start of section, end of section, extracting the whole section out and then extracting the line in the section, instead of reading the file a line at a time to extract the line as well as keeping track of when you enter and exit a section.

      So could this be written in Powershell (This is Winbatch code):

       

    • #279339
      Participant
      Topics: 9
      Replies: 707
      Points: 2,842
      Helping Hand
      Rank: Community Hero

      but if I have multiple sections in the file and each section contains the string to be extracted, then select-string would not work as well.

      Can you explain what you mean by it wouldn’t work as well? You can find all matches with Select-String and it will give you an object with the line and line number where it was found, the matching term, etc.

    • #279438
      Hil
      Participant
      Topics: 34
      Replies: 46
      Points: 265
      Rank: Contributor

      Can you explain what you mean by it wouldn’t work as well? You can find all matches with Select-String and it will give you an object with the line and line number where it was found, the matching term, etc.

      Yes you can find with Select-String, but you do not know what the item relates too. Thats why you you either need flags… or something like binarybuffer (as above)

    • #279501
      Participant
      Topics: 9
      Replies: 707
      Points: 2,842
      Helping Hand
      Rank: Community Hero

      What it relates to in regards to what? If you’re searching file A for text “blah”, the matching text relates to file A. Can you please explain what you mean in detail?

    • #279519
      Hil
      Participant
      Topics: 34
      Replies: 46
      Points: 265
      Rank: Contributor

      What it relates to in regards to what? If you’re searching file A for text “blah”, the matching text relates to file A. Can you please explain what you mean in detail?

      I am already able to extract the contents with powershell by way of using flags as I mentioned in my earlier post. But my question is: is is there way to avoid reading each line.

      For example lets say you have to extract “some text” from  an HTML page  and the line represented is “DIV some-text DV”. But there are tons of DIV elements on the page , so how would you know you have to extract the text from the specific DIV. For this you will need to find a unique identifier prior to this line, which could very well be a higher line.  Thats why the select-string will not work in this case. This is because when select-string returns a value, you will not know which one it references as there could be several of them on the page.

      I have done this by using flags. So when I encounter the unique identifier I set my flag = 1 and when the “some text” is found on maybe the next line or  several lines below I set the flag back to zero. Although this would mean reading each line, setting and then resetting flags. That brings me back to my original question: can i read the whole page and go right to the word/section I want to access, instead of having to read each line of the file. please re-read my original question.

      • This reply was modified 1 month, 1 week ago by Hil.
      • This reply was modified 1 month, 1 week ago by Hil.
      • This reply was modified 1 month, 1 week ago by Hil.
    • #279570
      Participant
      Topics: 9
      Replies: 707
      Points: 2,842
      Helping Hand
      Rank: Community Hero

      Your original question is lacking clarity just like your responses. What criteria (flag) are you wanting to use to differentiate sections? Your analogy of DIV doesn’t help as you still haven’t shown how you differentiate the sections. Based on the proprietary and code in your original post, it would appear “end of string” is the marker? If that’s the case simply state that because you can keep track of sections using a marker if the marker is known. I’d say based on the lack of responses others are not able to discern what it is you actually are trying to extract, how to differentiate sections, etc. Good luck, hopefully someone else can read your mind.

    • #279579
      Hil
      Participant
      Topics: 34
      Replies: 46
      Points: 265
      Rank: Contributor

      Its evident you are not trying the understand the question or the responses. Maybe by posting suggestions you get points for it. If you don’t know please give someone else the opportunity to suggest a solution, instead of asking questions to no end when everything has been specified in the first post to start with !!!!!

      Also if you keep ridiculing people because of your inadequacy to understand simple technology, it will only reflect badly on the forum. Be nice to people !!!!

      • This reply was modified 1 month, 1 week ago by Hil.
    • #279639
      Senior Moderator
      Topics: 10
      Replies: 202
      Points: 1,102
      Helping Hand
      Rank: Community Hero

      Post #279570 was reported as containing Inappropriate content, but I see no evidence of that. It appears to be mostly constructive criticism. I have cleared the report.

      In general, asking questions is to be expected and encouraged, as none of us can see into another’s work environment and we must offer advice while effectively blind. If someone says that they do not understand what you are asking, you should take them at their word.

      Please do not abuse the Report feature.

    • #280008
      Participant
      Topics: 10
      Replies: 203
      Points: 996
      Helping Hand
      Rank: Major Contributor

      As I read this, here is my take. If you read a file into a binary buffer, then the ONLY way to identify where in that buffer the result is found is to set a flag. I totally get Doug’s point in that with Select-String, you dont need flags as PS has in essence has taken care of that for you by identifying where in the file the strings reside. I agree with Doug, using Select-String, flags should not be needed.

      You also seem to have an issue with reading each line of the file with Select-String, yet with your “binary buffer” approach, to find all instances you are doing the very same thing.

      Just my $.02

    • #280023
      Participant
      Topics: 1
      Replies: 96
      Points: 441
      Helping Hand
      Rank: Contributor

      You could potentially stream text or bytes of a line without reading the whole line. However, something must instruct the code to stop reading (like it finds what it’s looking for early). The same can be said for reading lines as all lines don’t have to be read if target is found early. Most of the pattern matching commands have the ability to span multiple lines and apply lookbehind or lookahead matching. Combining that with other control flow logic could implement tracking flags for any associations you want to create.

      If I am working with a trivial version of your analogy, you basically need dynamic matching where one lookup determines another. Below is a multi-line string with tags (surrounded by []). You ultimately want the text after [tag4] and [tag8] but the only identifier you have is middle.

      Output:

Viewing 9 reply threads
  • You must be logged in to reply to this topic.