Processing large files in PHP

I’ve been using my own PHP web statistics script for over a year now. I realized that some dates were missing in reports. It turns out PHP has a limit of 2GB or so when fopen-ing files, regardless of the fact that the script is reading it line by line and not storing any lines in memory.

The solution is to use Linux split command to break the file in manageable pieces and process them one by one. Don’t go crazy and try to split it in 2GB pieces, unless you have abundant RAM. If you’re splitting it in 2GB files, the process will use 2GB of RAM while doing it. Ouch!!!

Since, I’m working with 1GB RAM total, I decided to go with 100MB files, hence using 100MB of RAM in doing so. Also, I wanted my files to have a prefix zzz_split_ (instead of a default x). “zzz” just lists nice at the end of all files in a directory.

split -C 100m access_log.old zzz_split_

This command split my apache access_log file into 30 pieces, 100 MB each, making sure that lines are not broken.

I fixed my PHP to glob the files in a directory.

$logfiles = '/home/admin/webstats/zzz_split_*';
foreach(glob($logfiles) as $logfile) {
$logfile = $logfile[0];
$handle = fopen($logfile,'r') or die("Can't open the log file");

Here’s a (wo)man page for split

split – split a file into pieces


Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, …; default
PREFIX is ‘x’. With no INPUT, or when INPUT is -, read standard input.

Mandatory arguments to long options are mandatory for short options

-a, –suffix-length=N
use suffixes of length N (default 2)

-b, –bytes=SIZE
put SIZE bytes per output file

-C, –line-bytes=SIZE
put at most SIZE bytes of lines per output file

-l, –lines=NUMBER
put NUMBER lines per output file

print a diagnostic to standard error just before each output
file is opened

–help display this help and exit

output version information and exit

SIZE may have a multiplier suffix: b for 512, k for 1K, m for 1 Meg.

Installing Ruby 1.8.5 on Ubuntu

Great how-to is located at Here’s the copy of it:

* Bring up the terminal, login as system root, and create temporary directory to store downloaded files.
[~/] su
[~/] mkdir local
[~/] cd local

* Download official source distribution. This will create a file named ruby-1.8.5.tar.gz in your local directory.
[~/local] wget

* Extract all the files from downloaded archive. This will create local/ruby-1.8.5 subdirectory that stores extracted files.
[~/local] tar xvfz ruby-1.8.5.tar.gz

* Install GNU C++ compiler you’ll need to build Ruby from source.
[~/local] apt-get install build-essential

* Run configure utility to determine your system configuration.
[~/local] cd ruby-1.8.5
[~/local/ruby-1.8.5] ./configure

* Run make command to compile and build Ruby.
[~/local/ruby-1.8.5] make

* Test newly built Ruby executable by running regression test suite. Upon successfull completion you’ll see a message like: “Finished in 44.904424 seconds. 1440 tests, 13585 assertions, 0 failures, 0 errors.”
[~/local/ruby-1.8.5] make test-all

* Install Ruby onto your system. This will move Ruby executable and utilities to /usr/local/bin and standard Ruby libraries to /usr/local/lib/ruby.
[~/local/ruby-1.8.5] make install

* Install Ruby documentation. This will compile Ruby documentation in format required for ri command.
[~/local/ruby-1.8.5] make install-doc

At this point the installation of Ruby 1.8.5 is complete. If you had previous version of Ruby installed in /usr/local/bin, you should take two extra steps:

* Make sure /usr/local/bin comes before /usr/bin in your $PATH:
[~/local/ruby-1.8.5] echo $PATH

* Logout from your current terminal session and login again to reload hashed value of ruby:
[~/local/ruby-1.8.5] which ruby
[~/local/ruby-1.8.5] ruby -v
ruby 1.8.5 (2006-08-25) [i686-linux]

Mike Dvorkin, Ruby Wizards Admin