HTML Embedded Data Extractor
This tool extracts embedded data (base64-encoded or URL-encoded) from HTML files and saves them as separate files. It then updates the HTML to reference these external files instead of embedding the data.
Features
- Extracts embedded data from
<pc-asset>
tags - Extracts embedded JavaScript, JSON, and other data formats
- Handles both base64-encoded and URL-encoded data
- Updates the HTML to reference the extracted files
- Preserves the original HTML file and creates a new one with the modifications
Installation
# Install dependencies
npm install
# Make the script executable
chmod +x extract-embedded-data.js
Usage
# Basic usage
node extract-embedded-data.js path/to/your/file.html
# Specify an output directory
node extract-embedded-data.js path/to/your/file.html path/to/output/dir
# Using npm script
npm run extract -- path/to/your/file.html
Example
# Extract data from the large HTML file in assets directory
node extract-embedded-data.js assets/25_03_01_e4_schwarz.html
This will:
- Create a directory called
extracted
in the same location as the HTML file - Extract all embedded data to separate files in this directory
- Create a new file called
25_03_01_e4_schwarz.extracted.html
with updated references
How it works
The script:
- Parses the HTML using JSDOM
- Finds all
<pc-asset>
tags withsrc
attributes starting withdata:
- Extracts the data and saves it to separate files
- Updates the
src
attributes to point to the extracted files - Looks for embedded data in script tags or JSON data
- Saves the modified HTML to a new file with
.extracted
added to the filename