<template>
  <section class="crawl-data-prep-notes">
    <h3>Crawl The Site</h3>
    <p>Use Screaming Frog SEO Spider to crawl the site you are looking to clean up orphaned media library attachments
       from. Note: it is crucial that you complete the crawl on the appropriate environment (Production, Staging,
       Development, Local). Further it is crucial that you complete the crawl just prior to utilizing this tool. Failure
       to do so may result in recently added Media Library items being deleted and resulting in 404s. As such, prior to
       using this tool on Production, there must be a content freeze of at least 2 hours.
    </p>
    <p>When crawling the site, you may need to configure screaming frog in order to crawl Staging, Development, or Local
       domains in order to crawl sub-domains or to ignore robots.txt. Under "Configuration" => "Crawl Config".
      <ul>
        <li>"Spider" => "Crawl", Under "Crawl Behaviour"
          <ul>
            <li>Enable: Crawl All Subdomains</li>
            <li>Enable: Follow Internal "nofollow"</li>
          </ul>
        </li>
        <li>"robots.txt"
          <ul>
            <li>Set to "Ignore robots.txt"</li>
            <li>Enable: Follow Internal "nofollow"</li>
          </ul>
        </li>
      </ul>
    </p>
    <h3>Extract the Attachment Data</h3>
    <p>Once the crawl has completed, open Sublime Text to begin prepping the resulting data.</p>
    <p>In Screaming Frog, toggle the view to show Images only. Then copy all of the images (Address column only) and
       paste them into Sublime Text. Next, repeat that process for PDFs, Flash, Other, Unknown. Note: we are only
       concerned with those files located within the Uploads directory and inside the Year/Month directories. Do not
       copy over any files outside these directories (external files, within plugins, within themes, etc.</p>

    <h3>Clean the Data</h3>
    <p>After getting all of the attachments into Sublime Text, you can now prepare the filenames for the tool. To
       accomplish this, you will need to remove the path leading up to the "Year" after "uploads" and leaving the
       proceeding "/". You will also need to remove the size information injected by wordpress.</p>
    <h4>Remove the excess path</h4>
    <p>Copy the path proceeding each filename from the start up to and including "uploads". Paste this in the Find
       input, then choose Find All. This should highlite all the excess path strings. Once highlighted, press
       delete.</p>
    <h4>Remove the excess path</h4>
    <p>Ensure the regex flag is toggled in the find tool. Paste the below regex pattern into the Find
       input, then choose Find All. </p>
    <code>[-][0-9]*[x][0-9]*+(?=\.)</code>
    <p>This should highlite all the size strings. Once highlighted, press delete.</p>


    <h3>Create a CSV from the data</h3>
    <p>Open up a Google Sheet. In the first column, type in the word "filename" in the first row. This will be the
       header for the row and is used in the crawl process to create a "key".</p>
    <p>Next, copy over the prepped data from Sublime Text and then paste it into the second row. This should populate
       each file to a new row.</p>
    <p>Once populated, Create a CSV from the sheet. "File" => "Download" => "Comma Separate Values (.csv)</p>
  </section>
</template>

<script>
export default {
  props: {},

  data() {
    return {};
  },
  mounted() {
    // console.log('CrawlDataPrepNotes component mounted');
  },
  computed: {},
  methods:  {
    randomMethod() {

    },
  }
}
</script>