20. Release 0.8.0 - 2004-05-18


Release (and branch heritrix-0_8 made at the heritrix-0_7_1 tag) because of concurrentmodificationexceptions if tens of seeds supplied and to fix domain-scope leakage. Also, made continuous build publically available, incorporated integration selftest into build, made it a maven-build only (ant-build no longer supported), added day/night configurations (refinements), ameliorated too-many-open files, added exploit of http-header content-type charset creating character streams, and heritrix now crawls ssl sites. UI improvements include red start by bad configuration, precompilation, and delineation of advanced settings. Many bug fixes.

20.1. Synopsis

20.2. Changes

Table 13. Changes

939032Addintegrate selftest into cruisecontrol build
903078AddOn reedit, red star by bad attribute setting.
935215Addday/night configurations
928745AddUI should only write changed config
908723Addrecord of settings changes should be kept
909249AddOnly one build, not two
925614Addmaven-only build rather than ant & maven
877295AddARCWriter should use a pool of open files -- if it helps
899226AddPrecompile UI pages
896798AddUI should be split into common/uncommon settings
895341AddUI web pages need to be more responsive
955527Fixdomain scope leakage
943768FixToo many open files
943453Fixempty seeds-report.txt
903092FixDoc. assumes bash. Allow tcsh/csh
908419Fixscript heritrix.sh goes into infinite loop
922104Fixheritrix.sh launch file path weirdness
935122FixToeThreads hung in ExtractorHTML after Pause
938591FixIllegalCharsetNameException: Windows-1256
934642FixNo doc-files/package.html in javadoc.
815544Fixembed-count sensitivity WRT redirects, preconditions
936610FixRefinement limits are not always saved
935340FixNPE exception in getMBeanInfo(settings)
904767FixUntried CrawlURIs should have clear status code
914287FixThread underutilization in broad crawls
930736FixKeyedQueue showing EMPTY status, but the length is 1.
934585FixNPE in XMLSettingsHandler.recursiveFindFiles()
935352FixFailed DNS does not have intended impact
896764Fixftp URIs are retried
848661FixRefetching of robots and/or DNS broken
935221FixNPE switching to 'expert' settings in HEAD
896779Fixrss extractor
896775FixJS extractor clueless on relative URIs
913214Fixconverting URI's '\' into '/' character
928665FixWhen going back to overrides, directory is gone
923342Fixshutdown.jsp unable to compile
913876FixARCWriterPool timeouts -- legitimate?
896766FixIf one URI connect-fails, hold queue, too
908719FixFetching simple URLs fails with S_CONNECT_FAILED (-2) error
809567Fixseeds held back/poor breadth first?
831480FixParsing links found between escaped quotes in JavaScript
895303FixDoes not extract applet URI correctky
791481Fixlinks to likely-embed types should be treated as embeds
900826FixFrontier.next() forceFetches will cause assertion error
877873FixFlash link extractor causes OutOfMemory exceptions.
899976FixShould be possible to resume from
896878FixHeritrix ignores charset
910210FixMax # of arcs not being respected.
904723FixNew profile should ensure unique name
902940FixWhen changing scope common scope settings are lost
903910Fixssl doesn't work
903084FixAllow that people use tcsh/csh not just bash
896788Fixhttps SSLHandshakeException: unknown certificate
901397FixCannot override settings that isn't set in globals
892253Fix'Waiting for pause' even after all threads done
896800Fixfilter 'invert', filter names need work
896835Fixmax-link-hops (etc.) ignored unless
872069Fixorder.xml absolute paths
892105FixCannot set TransclusionFilter attributes
874057FixLink puts garbage into arc file: http://www.msn.com/robots.t