Microsoft 365 Transformer PDF en Txt

FCMLE44 · 16 Août 2022

Bonjour

J ai un répertoire dans lequel se trouve des fichiers PDF à mettre en txt
J'ai créé un fichier de commande .Bat avec les données ci dessous
FOR /R "D:\Test Prévoyance\PREVOYANCE\" %%i IN (*.pdf) do (D:\Test Prévoyance\Pdf2Text\Pdf2Text.exe "%%i" "D:\Test Prévoyance\PREVOYANCE\%%~ni.txt")

J'ai créé une petite macro pour le lancer

VB:

Sub PARAMS()

Dim Fichier As String

        Fichier = ThisWorkbook.Sheets("PARAMS").Cells(2, 2).Value

        Shell "cmd.exe /k cd " & Fichier & "&&PREVOYANCE.bat"

        MsgBox "Fichiers Textes créés"

End Sub

Cela ne crée pas mes fichiers en txt
Quelqu'un aurait il une idée ?

Merci

TooFatBoy · 17 Août 2022

fanch55 à dit:
Bonjour à tous, pas d'installation possible mais pdf2txt est installé ? .... 🤔

I suppose it's a standalone. 😉

FCMLE44 · 17 Août 2022

Bonjour Oui malheureusement

fanch55 · 17 Août 2022

Avec la version actuelle de pdf2txt et vu l'emploi que vous en faites,
il n'y a pas besoin de passer par excel ou un fichier bat .
Il suffit de presser les touches Win + R et d'entrer la commande ci-dessous

Code:

D:\Test Prévoyance\Pdf2Text\pdf2text -o D:\Test Prévoyance\Pdf2Text  D:\Test Prévoyance\Pdf2Text\*.pdf

ou vous mettez la commande dans votre shell ...

FCMLE44 · 17 Août 2022

fanch55 à dit:
Avec la version actuelle de pdf2txt et vu l'emploi que vous en faites,
il n'y a pas besoin de passer par excel ou un fichier bat .
Il suffit de presser les touches Win + R et d'entrer la commande ci-dessous

Code:

D:\Test Prévoyance\Pdf2Text\pdf2text -o D:\Test Prévoyance\Pdf2Text D:\Test Prévoyance\Pdf2Text\*.pdf

ou vous mettez la commande dans votre shell ...

A quoi correspond le petit -o

FCMLE44 · 17 Août 2022

FCMLE44 à dit:
A quoi correspond le petit -o

Lorsque je le lance ca clignote c'est tout
Pouvez vous me confirmer l'ordre dans lequel je dois mettre mes répertoires sur la commande ?

fanch55 · 17 Août 2022

D:\Test Prévoyance\Pdf2Text\pdf2text -o D:\Test Prévoyance\Pdf2Text D:\Test Prévoyance\Pdf2Text\*.pdf
<------- - programme appelé --------><-- dossier où mettre les txt --><------- fichiers à convertir ------->

PDFTron PDF2Text V9.3080104.
Copyright (c) 2001-2022 PDFTron Systems Inc., www.pdftron.com.

You are running a DEMO version of PDF2Text.
In the demo version, random words or pages will be replaced with the <DEMO> string.

Usage: pdf2text [<options>] file...

OPTIONS:

--file... arg A list of folders and/or file names to process.

-o [ --output ] arg The folder used to store output files. By
default, the output will be displayed on
screen.

-a [ --pages ] arg (=-) Specifies the list of pages to convert. By
default, all pages are converted.

-e [ --encoding ] arg (=UTF8) Output text encoding:
UTF8
UTF16
The default output encoding is UTF8.

-f [ --format ] arg (=plain) Output text formating:
plain
wordlist
textruns
xml
The default output format is 'plain' text.

--noligatures Disables expanding of ligatures using a
predefined mapping. Default ligatures are: fi,
ff, fl, ffi, ffl, ch, cl, ct, ll, ss, fs, st,
oe, OE.

--nodehyphen Disables finding and removing hyphens that
split words across two lines. Hyphens are often
used a the end of lines as an indicator that a
word spans two lines. Hyphen detection enables
removal of hyphen character and merging of text
runs to form a single word. This option has no
effect on Tagged PDF files.

--no_dup_remove Disables removing duplicated text that is
frequently used to achieve visual effects of
drop shadow and fake bold.

--punct_break Treat punctuation (e.g. full stop, comma,
semicolon, etc.) as word break characters.

--remove_hidden_text Enables removal of text that is obscured by
images or rectangles. Since this option has
small performance penalty on performance of
text extraction, by default it is not enabled.

--no_invisible_text Enables removing text that uses rendering mode
3 (i.e. invisible text). Invisible text is
usually used in 'PDF Searchable Images' (i.e.
scanned pages with a corresponding OCR text).
As a result, invisible text will be extracted
by default.

--use_z_order Use Z-order as reading order for text

--output_bbox Include bounding box information for each text
element. If the output format is 'XML' the
bounding box information will be stored in
'bbox' attribute. If the output format is
'wordlist' the coordinates of the bounding box
will precede the word.

--xml_words_as_elements Output words as XML elements instead of inline
text.

--xml_output_styles Include font and styling information.

--json_zones Load zoning information from JSON file

--wordcount Get the number of words on each page.

--charcount Get total number of characters on each page.

--pageinfo Get the width, height, media box, crop box, and
page rotation for every page.

--prefix arg The prefix for output text files. The output
filename will be constructed by appending the
prefix string, the page number, and the
appropriate file extension (e.g. myprefix1.txt,
myprefix2.xml, etc). The prefix option should
be used only for processing of individual
documents. By default, PDF filename will be
used as a prefix.

--digits arg The number of digits used in the page counter
portion of the output filename. By default, new
digits are added as needed; however this
parameter could be used to format the page
counter field to a uniform width (e.g.
myfile0001.txt, myfile0002.txt, etc).

--subfolders Process all sub-directory for every directory
specified in the argument list. By default,
sub-directories are not processed.

-c [ --clip ] arg User definable clip box. The default clip
region is crop box of the page.

--noprompt Disables any user input. By default, the
application will ask for a valid password if
the password is incorrect.

-p [ --pass ] arg The password for secured PDF files. Not
required if the input document is not secured
using the 'open' password.

--extension arg (=.pdf) The default file extension used to process PDF
documents. The default extension is ".pdf".

--verb arg (=1) Set the opt.m_verbosity level to 'arg' (0-2).

-v [ --version ] Print the version information.

-h [ --help ] Print a listing of available options.

--lic_key arg PDFTron SDK license key. License keys can be passed
using this option or in a separate .lic file.

Examples:
pdf2text my.pdf
pdf2text -o test_out/ex1 test/my.pdf
pdf2text --wordcount my.pdf
pdf2text -o test_out -a 1 -f xml --output_bbox my.pdf

FCMLE44 · 17 Août 2022

fanch55 à dit:
D:\Test Prévoyance\Pdf2Text\pdf2text -o D:\Test Prévoyance\Pdf2Text D:\Test Prévoyance\Pdf2Text\*.pdf
<------- - programme appelé --------><-- dossier où mettre les txt --><------- fichiers à convertir ------->

PDFTron PDF2Text V9.3080104.
Copyright (c) 2001-2022 PDFTron Systems Inc., www.pdftron.com.

You are running a DEMO version of PDF2Text.
In the demo version, random words or pages will be replaced with the <DEMO> string.

Usage: pdf2text [<options>] file...

OPTIONS:

--file... arg A list of folders and/or file names to process.

-o [ --output ] arg The folder used to store output files. By
default, the output will be displayed on
screen.

-a [ --pages ] arg (=-) Specifies the list of pages to convert. By
default, all pages are converted.

-e [ --encoding ] arg (=UTF8) Output text encoding:
UTF8
UTF16
The default output encoding is UTF8.

-f [ --format ] arg (=plain) Output text formating:
plain
wordlist
textruns
xml
The default output format is 'plain' text.

--noligatures Disables expanding of ligatures using a
predefined mapping. Default ligatures are: fi,
ff, fl, ffi, ffl, ch, cl, ct, ll, ss, fs, st,
oe, OE.

--nodehyphen Disables finding and removing hyphens that
split words across two lines. Hyphens are often
used a the end of lines as an indicator that a
word spans two lines. Hyphen detection enables
removal of hyphen character and merging of text
runs to form a single word. This option has no
effect on Tagged PDF files.

--no_dup_remove Disables removing duplicated text that is
frequently used to achieve visual effects of
drop shadow and fake bold.

--punct_break Treat punctuation (e.g. full stop, comma,
semicolon, etc.) as word break characters.

--remove_hidden_text Enables removal of text that is obscured by
images or rectangles. Since this option has
small performance penalty on performance of
text extraction, by default it is not enabled.

--no_invisible_text Enables removing text that uses rendering mode
3 (i.e. invisible text). Invisible text is
usually used in 'PDF Searchable Images' (i.e.
scanned pages with a corresponding OCR text).
As a result, invisible text will be extracted
by default.

--use_z_order Use Z-order as reading order for text

--output_bbox Include bounding box information for each text
element. If the output format is 'XML' the
bounding box information will be stored in
'bbox' attribute. If the output format is
'wordlist' the coordinates of the bounding box
will precede the word.

--xml_words_as_elements Output words as XML elements instead of inline
text.

--xml_output_styles Include font and styling information.

--json_zones Load zoning information from JSON file

--wordcount Get the number of words on each page.

--charcount Get total number of characters on each page.

--pageinfo Get the width, height, media box, crop box, and
page rotation for every page.

--prefix arg The prefix for output text files. The output
filename will be constructed by appending the
prefix string, the page number, and the
appropriate file extension (e.g. myprefix1.txt,
myprefix2.xml, etc). The prefix option should
be used only for processing of individual
documents. By default, PDF filename will be
used as a prefix.

--digits arg The number of digits used in the page counter
portion of the output filename. By default, new
digits are added as needed; however this
parameter could be used to format the page
counter field to a uniform width (e.g.
myfile0001.txt, myfile0002.txt, etc).

--subfolders Process all sub-directory for every directory
specified in the argument list. By default,
sub-directories are not processed.

-c [ --clip ] arg User definable clip box. The default clip
region is crop box of the page.

--noprompt Disables any user input. By default, the
application will ask for a valid password if
the password is incorrect.

-p [ --pass ] arg The password for secured PDF files. Not
required if the input document is not secured
using the 'open' password.

--extension arg (=.pdf) The default file extension used to process PDF
documents. The default extension is ".pdf".

--verb arg (=1) Set the opt.m_verbosity level to 'arg' (0-2).

-v [ --version ] Print the version information.

-h [ --help ] Print a listing of available options.

--lic_key arg PDFTron SDK license key. License keys can be passed
using this option or in a separate .lic file.

Examples:
pdf2text my.pdf
pdf2text -o test_out/ex1 test/my.pdf
pdf2text --wordcount my.pdf
pdf2text -o test_out -a 1 -f xml --output_bbox my.pdf

Lorsque je mets ca ca fait pareil
C:\Users\Frede\OneDrive\Bureau\Test Prévoyance\Pdf2Text\pdf2text -o C:\Users\Frede\OneDrive\Bureau\Test Prévoyance\PREVOYANCE C:\Users\Frede\OneDrive\Bureau\Test Prévoyance\PREVOYANCE\*.pdf

Les fichiers PDF sont bien dans PREVOYANCE et les TXT doivent y etre aussi

fanch55 · 17 Août 2022

Pb avec les espaces dans les noms ...
Fichier bat à créer ou enclore les fichiers/dossiers de la ligne de commande entre des doubles guillemets

Code:

Set Pgm="C:\Users\Frede\OneDrive\Bureau\Test Prévoyance\Pdf2Text\pdf2text.exe"
Set Tgt="C:\Users\Frede\OneDrive\Bureau\Test Prévoyance\PREVOYANCE"
%Pgm% -o %Tgt% %Tgt%\*pdf

L'exécuter via cmd ( noms sur mon pc pour faire voir que ça marche ... ):

TooFatBoy · 17 Août 2022

FCMLE44 à dit:
Lorsque je mets ca ca fait pareil
C:\Users\Frede\OneDrive\Bureau\Test Prévoyance\Pdf2Text\pdf2text -o C:\Users\Frede\OneDrive\Bureau\Test Prévoyance\PREVOYANCE C:\Users\Frede\OneDrive\Bureau\Test Prévoyance\PREVOYANCE\*.pdf

Comme l'a dit fanch55 le problème vient des espaces dans les chemins ou les noms des fichiers.

D'ailleurs, je n'avais pas vu, mais dans ton message originel, tu as oublié les guillemets pour un des trois chemins :
FOR /R "D:\Test Prévoyance\PREVOYANCE\" %%i IN (*.pdf) do (D:\Test Prévoyance\Pdf2Text\Pdf2Text.exe "%%i" "D:\Test Prévoyance\PREVOYANCE\%%~ni.txt")

FCMLE44 · 17 Août 2022

TooFatBoy à dit:
Comme l'a dit fanch55 le problème vient des espaces dans les chemins ou les noms des fichiers.

D'ailleurs, je n'avais pas vu, mais dans ton message originel, tu as oublié les guillemets pour un des trois chemins :
FOR /R "D:\Test Prévoyance\PREVOYANCE\" %%i IN (*.pdf) do (D:\Test Prévoyance\Pdf2Text\Pdf2Text.exe "%%i" "D:\Test Prévoyance\PREVOYANCE\%%~ni.txt")

Merci
même avec les guillemets ca ne fonctionne pas jene comprends pas

fanch55 · 17 Août 2022

FCMLE44 à dit:
Merci
même avec les guillemets ca ne fonctionne pas jene comprends pas

Même avec le post #23 ?

Pourriez-vous juste exécuter "C:\Users\Frede\OneDrive\Bureau\Test Prévoyance\Pdf2Text\pdf2text.exe"
et dire quelle est la version installée ?

FCMLE44 · 17 Août 2022

Lorsque je lance mon .bat avec cette commande
Ca ouvre et ca n'arrete pas de répeter cette phrase C:\Users\Frede\OneDrive\Bureau\Test Prévoyance\Pdf2Text\pdf2text.exe

Version 3.2

fanch55 · 17 Août 2022

FCMLE44 à dit:
Lorsque je lance mon .bat avec cette commande
Ca ouvre et ca n'arrete pas de répeter cette phrase C:\Users\Frede\OneDrive\Bureau\Test Prévoyance\Pdf2Text\pdf2text.exe

Version 3.2

Désolé, la dernière version est la 9 .
Je ne peux mieux vous aider car la V3.2 ne se trouve plus ....

Effectuez une recherche sur Excel Downloads...

Microsoft 365 Transformer PDF en Txt

FCMLE44

XLDnaute Impliqué

TooFatBoy

XLDnaute Barbatruc

FCMLE44

XLDnaute Impliqué

fanch55

XLDnaute Barbatruc

FCMLE44

XLDnaute Impliqué

FCMLE44

XLDnaute Impliqué

fanch55

XLDnaute Barbatruc

FCMLE44

XLDnaute Impliqué

fanch55

XLDnaute Barbatruc

TooFatBoy

XLDnaute Barbatruc

FCMLE44

XLDnaute Impliqué

fanch55

XLDnaute Barbatruc

FCMLE44

XLDnaute Impliqué

fanch55

XLDnaute Barbatruc

Discussions similaires

Nous accordons de l'importance à votre vie privée