Working with large XML in PHP

Working with large XML in PHP

Write a class that will generate and save an xml document to disk with the following structure:

<?xml version="1.0" encoding="UTF-8" ?>
<root>
	<serverType>server</serverType>
	<entry>
		<id>{{fileId}}</id>
		<type>text</type>
		<name>{{fileName}}</name>
		<text>{{fileContent}}</text>
	</entry>
</root>        

Where:

{{fileId}} - unique id (must be generated)

{{fileName}} - file name

{{fileContent}} - file contents

  • As input, the class receives the path to a text file.
  • The result of the class is an xml document.
  • The input file can be any size. (Ex. 10 MB to 4 GB)
  • Please note that memory_limit cannot be changed.

We implement a streaming approach:

<?php
class XmlGenerator {
    private $filePath;
    private $writer;

    public function __construct($filePath) {
        $this->filePath = $filePath;
        $this->writer = new XMLWriter();
    }

    public function generateXml() {
        if (!file_exists($this->filePath)) {
            throw new Exception("File not found: " . $this->filePath);
        }

        $outputFilePath = __DIR__.'/output.xml'; // Specify the path where to save the file
        $this->writer->openUri($outputFilePath);
        $this->writer->setIndent(true);
        $this->writer->startDocument('1.0', 'UTF-8');
        $this->writer->startElement('root');
        $this->writer->writeElement('serverType', 'server');

        $fileId = uniqid();

        $this->writer->startElement('entry');
        $this->writer->writeElement('id', $fileId);
        $this->writer->writeElement('type', 'text');
        $this->writer->writeElement('name', basename($this->filePath));

        $handle = fopen($this->filePath, 'r');
        while (!feof($handle)) {
            $chunk = fread($handle, 8192); //Block size
            $this->writer->writeElement('text', $chunk);
        }
        fclose($handle);

        $this->writer->endElement();
        $this->writer->endElement();
        $this->writer->endDocument();

        $this->writer->flush();

        return $outputFilePath;
    }
}

$filePath = __DIR__.'/4gb.txt';
$generator = new XmlGenerator($filePath);

try {
    $outputFilePath = $generator->generateXml();
    echo "XML the file was successfully created and saved to: $outputFilePath";
} catch (Exception $e) {
    echo "Error: " . $e->getMessage();        

Class that reads large xml:

<?php
class Reader {
    private $filePath;
    private $reader;

    public function __construct($filePath) {
        $this->filePath = $filePath;
        $this->reader = new XMLReader();
    }

    public function readTextElement($index) {
        if (!$this->reader->open($this->filePath)) {
            return null;
        }

        $fileContent = '';
        $textCount = 0;

        while ($this->reader->read()) {
            if ($this->reader->nodeType == XMLReader::ELEMENT) {
                if ($this->reader->name == 'text') {
                    $textCount++;
                    if ($textCount == $index) {
                        $this->reader->read();
                        $fileContent = $this->reader->readString();
                        break;
                    }
                }
            } elseif ($this->reader->nodeType === XMLReader::END_ELEMENT && $this->reader->name === 'entry') {
                break;
            }
        }

        $this->reader->close();

        return $fileContent;
    }
}

$filePath = __DIR__.'/output.xml';
$xmlReader = new Reader($filePath);

$fileContent = $xmlReader->readTextElement(524278);

if ($fileContent !== null) {
    echo $fileContent;
} else {
    echo "Item not found";
}        

Tested with a size of 4 gigabytes.

Avutzhan Dautov

Lead Backend Developer

1 年

Hi did you tested larger files to check how script works in highload? For example 100gb file.

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了