Count number of lines without downloading s3 file






















It only takes a minute to sign up. Connect and share knowledge within a single location that is structured and easy to search. If this information is not already present as meta data in a separate file or embedded in the data, or available through a query to the system that you exported the data from and if there is no index file of some description available, then the quickest way to count the number of lines is by using wc -l on the file.

To count the number of records in the file, you will have to know what record separator is in used and use something like awk to count these.

Again, that is if this information is not already stored elsewhere as meta data and if it's not available through a query to the originating system, and if the records themselves are not already enumerated and sorted within the file. You should not use line based utilities such as awk and sed. These utilities will issue a read system call for every line in the input file see that answer on why this is so.

If you have lots of lines, this will be a huge performance loss. Since your file is 4TB in size, I guess that there are a lot of lines. So even wc -l will produce a lot of read system calls, since it reads only bytes per call on my system. Anyway this would be an improvement over awk and sed. The best method - unless you write your own program - might be just.

This is no useless use of cat, because cat reads chunks of bytes per read system call on my system and wc -l will issue more, but not on the file directly, instead on the pipe.

But however, cat tries to read as much as possible per system call. Below is what worked for me, tail -5 the file and then grep the text in your last line with -n option in grep Sign up to join this community. The best answers are voted up and rise to the top. Stack Overflow for Teams — Collaborate and share knowledge with a private group.

Create a free Team What is Teams? Learn more. What is a quick way to count lines in a 4TB file? Ask Question. Asked 2 years, 8 months ago. Active 4 months ago. Whereas, for a tweet on Twitter the character limit is characters and the character limit is 80 per post for Snapchat. Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics. Skip to content. Change Language. Related Articles. Table of Contents. Save Article. Improve Article.

The function would be the same as the previous option, just runs remotely. Use EC2 - If you need more control of your code, custom operating systems, etc, you can have a dedicated VM on ec2 that handles this. Improve this answer. Barak Barak 2, 2 2 gold badges 18 18 silver badges 30 30 bronze badges. Agree, lambda would be an efficient choice here. Thanks — Tariq Kamal.

Should be possible. Depends a lot on what type of editing you plan on doing as well as the number of files. You might want to have a lambda function that processes one file and some method of orchestrating the calls to it using something like SQS or step functions. But it should be possible — Barak. Sign up or log in Sign up using Google. Sign up using Facebook.

Sign up using Email and Password. Post as a guest Name. Email Required, but never shown. The Overflow Blog. Who owns this outage?



0コメント

  • 1000 / 1000