Have an account? Sign in
Login  Register  Facebook
Problem with a large CSV file
Hi everyone,

I have a very large CSV file. 51427 lines to be exact.

I need to import the entire file into a MySQL database, however, the script times out due to server settings and slow connection (and maybe other reasons that I am not aware of).

So - I am now passing parameters START and LIMIT via address bar to import, like this:

http://my.server.address/import.php?...000&limit=1000

This reads the entire CSV file into an array, and starts at line 10000 of the array and inserts into the database until it reaches line 11000, and then terminates the script.

This works very nicely, however, I am not happy having to reach the entire 51427 lines of the CSV file into an array before processing.

Is there not a way where I can only read the required lines into an array? That would speed things up significantly.

Thanks in advance,

Kobus
Started: September 17, 2011 Latest Activity: December 19, 2018 csv php
7 Answers
Levitra Avec Dapoxetine Avis Which Is Safer Amoxicillin Or Zithromax Association Amoxicillin Cephalosporin <a href=http://ordercheapvia.com>viagra</a> Cialis Ciao Zithromax Drug Type Acquistare Kamagra Germania Caida Del Cabello Propecia Finasteride Buy Celebrex Cvs <a href=http://costofvia.com>viagra</a> Generic Tadalafil Children Urine Tract Infection Amoxicillin Viagra Prezzo Migliore Zithromax Teeth <a href=http://gaprap.com>buy viagra</a> Vente Kamagra Espagne Cephalexin Stability Side Effects To Cephalexin Comprar Viagra De Calidad Buy Lexapro Online Us Pharmacy <a href=http://prilipills.com>online dapoxetine</a> Sibusat Amoxil 500 Mg $0.29 Unit Price Propecia Pas Cher France <a href=http://lapizmoon.com>cialis tablets for sale</a> Hydrochlorothiazide Hypertension Renfrewshire Buy Propecia In The Uk

Posted: Jeadyelty
In: December 19, 2018

Since I am a new user, I can only post once in 5 minutes, therefore, I am answering everyone here.

Thanks for the responses everyone. To get back to you:

1. I do not have shell access.
2. I need to transform data before import, so I have to use script, not direct load inline.
3. I have updated my PHP directives via ini_set to get the desired result, but I was wondering if I could prevent having to do this.
4. With regards to the progressive reader, I will test it out as soon as I can, thanks.

Kobus

Posted: Bliksempie
In: September 17, 2011

What's up with the UC_FIRST? :-?
September 17, 2011

You might want to look at streaming the csv file. Send start file location, start position and number of bytes to read as get paramters to a ProgressiveReader.php
class NoFileFoundException extends Exception {
    function __toString() {
        return '<h1><b>ERROR:</b> could not find ('
                    .$this->getMessage().
                    ') please check your settings.</h1>';
    }
}

class NoFileOpenException extends Exception {
    function __toString() {
        return '<h1><b>ERROR:</b> could not open ('
                    .$this->getMessage().
                    ') please check your settings.</h1>';
    }
}

interface Reader {
    function setFileName($fName);
    function open();
    function setBufferOffset($offset);
    function bufferSize();
    function isOffset();
    function setPacketSize($size);
    function read();
    function isEOF();
    function close();
    function readAll();
}

class ProgressiveReader implements Reader {
    private $fName;
    private $fileHandler;
    private $offset = 0;
    private $packetSize = 0;

    public function setFileName($fName) {
        $this->fName = $fName;
        if(!file_exists($this->fName)) {
            throw new NoFileFoundException($this->fName);
        }
    }

    public function open() {
        try {
            $this->fileHandler = fopen($this->fName, 'rb');
        }
        catch (Exception $e) {
            throw new NoFileOpenException($this->fName);
        }
        fseek($this->fileHandler, $this->offset);
    }

    public function setBufferOffset($offset) {
        $this->offset = $offset;
    }

    public function bufferSize() {
        return filesize($this->fName) - (($this->offset > 0) ? ($this->offset  + 1) : 0);
    }

    public function isOffset() {
        if($this->offset === 0) {
            return false;
        }
        return true;
    }

    public function setPacketSize($size) {
        $this->packetSize = $size;
    }

    public function read() {
        return fread($this->fileHandler, $this->packetSize);
    }

    public function isEOF() {
        return feof($this->fileHandler);
    }

    public function close() {
        if($this->fileHandler) {
            fclose($this->fileHandler);
        }
    }

    public function readAll() {
        return fread($this->fileHandler, filesize($this->fName));
    }
}
Here are the unit tests:
require_once 'PHPUnit/Framework.php';

require_once dirname(__FILE__).'/../ProgressiveReader.php';

class ProgressiveReaderTest extends PHPUnit_Framework_TestCase {

    protected $reader;
    private $fp;
    private $fname = "Test.txt";

    protected function setUp() {
        $this->createTestFile();
        $this->reader = new ProgressiveReader();
    }

    protected function tearDown() {
        $this->reader->close();
    }

    public function test_isValidFile() {
        $this->reader->setFileName($this->fname);
    }

    public function test_isNotValidFile() {
        try {
            $this->reader->setFileName("nothing.tada");
        }
        catch (Exception $e) {
            return;
        }

        $this->fail();
    }

    public function test_isFileOpen() {
        $this->reader->setFileName($this->fname);
        $this->reader->open();
    }

    public function test_couldNotOpenFile() {
        $this->reader->setFileName($this->fname);
        try {
            $this->deleteTestFile();
            $this->reader->open();
        }
        catch (Exception $e) {
            return;
        }

        $this->fail();
    }

    public function test_bufferSizeZeroOffset() {
        $this->reader->setFileName($this->fname);
        $this->reader->open();
        $this->assertEquals($this->reader->bufferSize(), 12);
    }

    public function test_bufferSizeTwoOffset() {
        $this->reader->setFileName($this->fname);
        $this->reader->setBufferOffset(2);
        $this->reader->open();
        $this->assertEquals($this->reader->bufferSize(), 9);
    }

    public function test_readBuffer() {
        $this->reader->setFileName($this->fname);
        $this->reader->setBufferOffset(0);
        $this->reader->setPacketSize(1);
        $this->reader->open();
        $this->assertEquals($this->reader->read(), "T");
    }

    public function test_readBufferWithOffset() {
        $this->reader->setFileName($this->fname);
        $this->reader->setBufferOffset(2);
        $this->reader->setPacketSize(1);
        $this->reader->open();
        $this->assertEquals($this->reader->read(), "S");
    }

    public function test_readSuccesive() {
        $this->reader->setFileName($this->fname);
        $this->reader->setBufferOffset(0);
        $this->reader->setPacketSize(6);
        $this->reader->open();
        $this->assertEquals($this->reader->read(), "TEST1\n");
        $this->assertEquals($this->reader->read(), "TEST2\n");
    }

    public function test_readEntireBuffer() {
        $this->reader->setFileName($this->fname);
        $this->reader->open();
        $this->assertEquals($this->reader->readAll(), "TEST1\nTEST2\n");
    }

    public function test_isNotEOF() {
        $this->reader->setFileName($this->fname);
        $this->reader->setBufferOffset(2);
        $this->reader->setPacketSize(1);
        $this->reader->open();
        $this->assertFalse($this->reader->isEOF());
    }

    public function test_isEOF() {
        $this->reader->setFileName($this->fname);
        $this->reader->setBufferOffset(0);
        $this->reader->setPacketSize(15);
        $this->reader->open();
        $this->reader->read();
        $this->assertTrue($this->reader->isEOF());
    }

    public function test_isOffset() {
        $this->reader->setFileName($this->fname);
        $this->reader->setBufferOffset(2);
        $this->assertTrue($this->reader->isOffset());
    }

    public function test_isNotOffset() {
        $this->reader->setFileName($this->fname);
        $this->assertFalse($this->reader->isOffset());
    }

    private function createTestFile() {
        $this->fp = fopen($this->fname, "wb");
        fwrite($this->fp, "TEST1\n");
        fwrite($this->fp, "TEST2\n");
        flush();
        fclose($this->fp);
    }

    private function deleteTestFile() {
        if(file_exists($this->fname)) {
            unlink($this->fname);
        }

    }
}

Posted: MacOS
In: September 17, 2011

This reads the entire CSV file into an array
All of the 50000+ lines?
Advance to the start of the wanted block of the file from PHP by reading line by line (fgets()) and then add each (needed) line to the array; you can get the array of the line with fgetcsv().
or use the fast MySQL LOAD DATA INFILE command
If that\'s not an option, you could split the CSV file (assuming access to a shell).

Posted: is_set
In: September 17, 2011

Hi, I can't import via MySQL, because I need to transform data before importing. Thanks for the response.
September 17, 2011

Can you connect to the database server directly?

If so, I would consider using a 3rd party program like SQLyog to import your csv.

You could also upload the file and use the mysql shell to import that data directly:
LOAD DATA INFILE '/path/to/your_file.csv' INTO TABLE table_name FIELDS TERMINATED BY ',';

Posted: Go
In: September 17, 2011

Have you tried using bash/shell(if you're on linux) to import your csv into mysql? You can also use ruby or perl or whatnot, as I think that's what you should use instead of php(or any web app) to import the file.

Posted: xtremex
In: September 17, 2011

Your script probably takes too long and it gets terminated.

You should look for max_execution_time directive in php.ini and set it to something that suits you.

The default max_execution_time is set to 30 seconds, so your script probably gets terminated.

If you also have scripts that need to be limited in time, you can do that individually by calling set_time_init();

Posted: MacOS
In: September 17, 2011

Your Answer

xDo you want to answer this question? Please login or create an account to post your answer