Abstract
Release (and branch heritrix-0_8 made at the heritrix-0_7_1 tag) because of concurrentmodificationexceptions if tens of seeds supplied and to fix domain-scope leakage. Also, made continuous build publically available, incorporated integration selftest into build, made it a maven-build only (ant-build no longer supported), added day/night configurations (refinements), ameliorated too-many-open files, added exploit of http-header content-type charset creating character streams, and heritrix now crawls ssl sites. UI improvements include red start by bad configuration, precompilation, and delineation of advanced settings. Many bug fixes.
Table 13. Changes
ID | Type | Summary |
---|---|---|
939032 | Add | integrate selftest into cruisecontrol build |
903078 | Add | On reedit, red star by bad attribute setting. |
935215 | Add | day/night configurations |
928745 | Add | UI should only write changed config |
908723 | Add | record of settings changes should be kept |
909249 | Add | Only one build, not two |
925614 | Add | maven-only build rather than ant & maven |
877295 | Add | ARCWriter should use a pool of open files -- if it helps |
899226 | Add | Precompile UI pages |
896798 | Add | UI should be split into common/uncommon settings |
895341 | Add | UI web pages need to be more responsive |
955527 | Fix | domain scope leakage |
943770 | Fix | ConcurrentModificationExceptions |
943768 | Fix | Too many open files |
943781 | Fix | ConcurrentModificationExceptions |
943453 | Fix | empty seeds-report.txt |
903092 | Fix | Doc. assumes bash. Allow tcsh/csh |
908419 | Fix | script heritrix.sh goes into infinite loop |
922104 | Fix | heritrix.sh launch file path weirdness |
935122 | Fix | ToeThreads hung in ExtractorHTML after Pause |
938591 | Fix | IllegalCharsetNameException: Windows-1256 |
934642 | Fix | No doc-files/package.html in javadoc. |
815544 | Fix | embed-count sensitivity WRT redirects, preconditions |
936610 | Fix | Refinement limits are not always saved |
935340 | Fix | NPE exception in getMBeanInfo(settings) |
904767 | Fix | Untried CrawlURIs should have clear status code |
914287 | Fix | Thread underutilization in broad crawls |
930736 | Fix | KeyedQueue showing EMPTY status, but the length is 1. |
934585 | Fix | NPE in XMLSettingsHandler.recursiveFindFiles() |
935352 | Fix | Failed DNS does not have intended impact |
896764 | Fix | ftp URIs are retried |
848661 | Fix | Refetching of robots and/or DNS broken |
935221 | Fix | NPE switching to 'expert' settings in HEAD |
896779 | Fix | rss extractor |
896775 | Fix | JS extractor clueless on relative URIs |
913214 | Fix | converting URI's '\' into '/' character |
928665 | Fix | When going back to overrides, directory is gone |
923342 | Fix | shutdown.jsp unable to compile |
913876 | Fix | ARCWriterPool timeouts -- legitimate? |
896766 | Fix | If one URI connect-fails, hold queue, too |
908719 | Fix | Fetching simple URLs fails with S_CONNECT_FAILED (-2) error |
809567 | Fix | seeds held back/poor breadth first? |
831480 | Fix | Parsing links found between escaped quotes in JavaScript |
895303 | Fix | Does not extract applet URI correctky |
791481 | Fix | links to likely-embed types should be treated as embeds |
900826 | Fix | Frontier.next() forceFetches will cause assertion error |
877873 | Fix | Flash link extractor causes OutOfMemory exceptions. |
899976 | Fix | Should be possible to resume from |
896878 | Fix | Heritrix ignores charset |
910210 | Fix | Max # of arcs not being respected. |
904723 | Fix | New profile should ensure unique name |
902940 | Fix | When changing scope common scope settings are lost |
903910 | Fix | ssl doesn't work |
903084 | Fix | Allow that people use tcsh/csh not just bash |
896788 | Fix | https SSLHandshakeException: unknown certificate |
901397 | Fix | Cannot override settings that isn't set in globals |
892253 | Fix | 'Waiting for pause' even after all threads done |
896800 | Fix | filter 'invert', filter names need work |
896835 | Fix | max-link-hops (etc.) ignored unless |
872069 | Fix | order.xml absolute paths |
892105 | Fix | Cannot set TransclusionFilter attributes |
874057 | Fix | Link puts garbage into arc file: http://www.msn.com/robots.t |